Selected publications by members of the UCT Computer Science department about the development of The digital Bleek and Lloyd and the online |xam – English dictionary.

Contributors include Kyle Williams, Hussein Suleman, Lebogang Molwantoa, Sanvir Manilal and Rizmari Versfeld.

Creating a Handwriting Recognition Corpus for Bushman Languages, 2011

Handwriting recognition systems rely on the existence of a corpus for training recognition models and evaluating accuracy. Creating a handwriting recognition corpus for the Bushman languages of southern Africa is difficult due to the complexities of the script used to represent them and the fact that this script cannot be represented using Unicode. ( … )View this paper here.
Using a hidden Markov model to transcribe handwritten Bushman texts, 2011

The Bushman texts in the Bleek and Lloyd Collection contain complex diacritics that make automatic transcription difficult. Transcriptions of these texts would allow for enhanced digital library services to be created for interacting with the collection. In this study, an investigation into automatic transcription of the Bushman texts was performed using the popular method of using a Hidden Markov Model for text line recognition. (… )View this paper here
Digital Library in a 3D Virtual World: The Digital Bleek and Lloyd Collection in Second Life, 2010

This research explores and demonstrates the process of setting up a 3D representation of a typical web-based digital library called ‘The Digital Bleek and Lloyd collection’ in the popular 3D virtual world, ‘Second Life’. The processes of building, scripting, and evaluation of the 3D exhibit are discussed. The report concludes that SL is a good platform for this kind of cultural representation. At a university level it could be used to showcase and share researchers’ work. (… )View this paper here
A Visual Dictionary for an Extinct Language, 2010

Cultural heritage artefacts are often digitised in order to allow for them to be easily accessed by researchers and scholars. In the case of the Bleek and Lloyd dictionary of the |xam Bushman language, 14000 pages were digitised. These pages could not be transcribed, however, because the language and script are both extinct. (… )View this paper here
Translating handwritten Bushman texts, 2010

The Bleek and Lloyd Collection is a collection of artefacts documenting the life and language of the Bushman people of southern Africa in the 19th century. Included in this collection is a handwritten dictionary that contains English words and their corresponding |xam Bushman language translations. This dictionary allows for the manual translation of |xam words that appear in the notebooks of the Bleek and Lloyd collection. This, however, is not practical due to the size of the dictionary, which contains over 14000 entries. To solve this problem a content-based image retrieval system was built that allows for the selection of a |xam word from a notebook and returns matching words from the dictionary. The system shows promise with some search keys returning relevant results. (… )View this paper here