Senin, 16 April 2012

        John Goldsmith, Yu Hu, Irina Matveeva, and Colin Sprague 
A heuristic for morpheme discovery based on string edit distance

 In this journal said talks about study about this topic, we must to learn about morphology of languages, that is, with a high average number of morphemes per word. In this paper, the writer focus in Swahili, a major Bantu language of East Africa, and the goal is the development of a system that can automatically produce a morphological analyzer of a text on the basis of a large corpus. In addition, the writer wan to present a new bootstrapping heuristic, one that is particularly useful in the analysis of languages with rich morphologies and that is based on the string edit distance dynamic programming algorithm. To show that how it works and how it can be used to rank and quantify the robustness of morphological generalizations in a set of data.

As we know, the bootstrapping heuristic is designed to rapidly come up with a set of candidate strings of morphemes, while the model consists of an explicit formulation of either what constitute an adequate morphology for a set of data, or  an objective function that must be optimized, given a corpus of data, in order to find the correct morphological analysis.

According Goldsmith (2001) for using the discovery of signatures as the bootstrapping heuristic, where a signature is a maximal set of stems and suffixes with the property that all combinations of stems and suffixes are found in the corpus in question. So, we can conclude, In particular, a signature is a set of forms that can be characterized by the rule of the bootstrapping heuristic.



In the all part of this paper, we can conclude, the SED-based heuristic is empirically superior to the SF- and PF-based heuristics as a means of identifying morphemes in natural languages with rich morphologies. The SED-based heuristic that we have described is a rapid method for analyzing data from languages such as Swahili and the other Bantu with a rich morphology, and coming u