Page 66 - 3-son 2018 yil
P. 66
Хорижий филология №3, 2018 йил
longest morpheme like g+a+r+c+h+i+l+i+k. make usually a large number of rules. From
Linguistic database of Uzbek input software right to left the first vowel is removed when it
in morphological parsing. analyzes for deleting some possessive cases.
Additionally, orthographic rules has So we can see this situation like this chart:
important role for all agglutinative languages burun+im=>burnim-deleting
for morphological analysis. Because there are, shahar+im=> shahrim-deleting
so many phonetical changes in the words
u r u n
b
Begin
im
a h a r
sh
h r
Other possibilities are epenthesis of a In a parser, morphological analysis of
segment under phonological conditions. Take words is an important prerequisite for
for example possessive case or dative case in syntactic analysis. Properties of a word the
Uzbek: parser needs to know are its part-of-speech
obro‗+im=>obro‗yim (my reputation); category and the morphosyntactic information
u+ga=> unga (he=> him) encoded in the particular word form. Another
Word error rate (WER) is the sum of important task is lemmatization, i.e. finding
insertions, deletions, and substitutions the corresponding dictionary form for a given
normalized by the length of the reference input word, because for many applications a
sentence. A slight variant (WERg) normalizes lemma lexicon is used to provide more
this value by the length of the Levenshtein detailed syntactic (e.g, valency) and semantic
path, i.e., the sum of insertions, deletions, information for deep analysis.
substitutions, and matches: this ensures that Alternation and adjacency of
the measure is between zero (when the morphemes is important to analyze
produced sentence is identical to the automatically for finite state transducers.
reference) and one (when the candidate must Following scheme shows morphotactic order
be entirely deleted, and all words in the of the verb in Uzbek.
reference must be inserted) [3].
65