A heuristic for morpheme discovery based on string edit distance
In this journal said talks about study about this topic, we must to learn about morphology of languages,
that is, with a high average number of morphemes per word. In this
paper, the writer focus in Swahili, a major Bantu language of East
Africa, and the goal is the development of a system that can
automatically produce a morphological analyzer of a text on the basis of
a large corpus. In addition, the writer wan to present a new
bootstrapping heuristic, one that is particularly useful in the analysis
of languages with rich morphologies and that is based on the string
edit distance dynamic programming algorithm. To show that how it works
and how it can be used to rank and quantify the robustness of
morphological generalizations in a set of data.
As
we know, the bootstrapping heuristic is designed to rapidly come up
with a set of candidate strings of morphemes, while the model consists
of an explicit formulation of either what constitute an adequate
morphology for a set of data, or an objective function that must be
optimized, given a corpus of data, in order to find the correct
morphological analysis.
According Goldsmith (2001) for using the discovery of signatures as the bootstrapping heuristic, where a signature is
a maximal set of stems and suffixes with the property that all
combinations of stems and suffixes are found in the corpus in question.
So, we can conclude, In particular, a signature is a set of forms that
can be characterized by the rule of the bootstrapping heuristic.
In
the all part of this paper, we can conclude, the SED-based heuristic is
empirically superior to the SF- and PF-based heuristics as a means of
identifying morphemes in natural languages with rich morphologies. The
SED-based heuristic that we have described is a rapid method for analyzing data from languages such as Swahili and the other Bantu with a rich morphology, and coming u