Page 66 - 3-son 2018 yil
P. 66

Хорижий филология  №3, 2018 йил


            longest  morpheme  like  g+a+r+c+h+i+l+i+k.         make  usually  a  large  number  of  rules.  From
            Linguistic  database  of  Uzbek  input  software    right to left the first vowel is removed when it
            in morphological parsing.                           analyzes  for  deleting  some  possessive  cases.
                   Additionally,  orthographic  rules  has      So we can see this situation like this chart:
            important role for all agglutinative languages             burun+im=>burnim-deleting
            for morphological analysis. Because there are,             shahar+im=> shahrim-deleting
            so  many  phonetical  changes  in  the  words


                                        u            r           u         n
                      b



                       Begin
                                                                                    im
                                   a             h          a         r

                         sh
                                               h             r



                   Other possibilities are epenthesis of a             In a parser, morphological analysis of
            segment under phonological conditions. Take         words  is  an  important  prerequisite  for
            for example possessive case or dative case in       syntactic  analysis.  Properties  of  a  word  the
            Uzbek:                                              parser  needs  to  know  are  its  part-of-speech
                   obro‗+im=>obro‗yim (my reputation);          category and the morphosyntactic information
            u+ga=> unga (he=> him)                              encoded in the particular word form. Another
                   Word error rate (WER) is the sum of          important  task  is  lemmatization,  i.e.  finding
            insertions,   deletions,   and   substitutions      the corresponding dictionary form for a given
            normalized  by  the  length  of  the  reference     input word, because  for many  applications  a
            sentence. A slight variant (WERg) normalizes        lemma  lexicon  is  used  to  provide  more
            this  value  by  the  length  of  the  Levenshtein   detailed syntactic (e.g, valency) and semantic
            path,  i.e.,  the  sum  of  insertions,  deletions,   information for deep analysis.
            substitutions,  and  matches:  this  ensures  that         Alternation    and    adjacency     of
            the  measure  is  between  zero  (when  the         morphemes      is   important   to   analyze
            produced  sentence  is  identical  to  the          automatically  for  finite  state  transducers.
            reference) and one (when the candidate must         Following scheme shows  morphotactic order
            be  entirely  deleted,  and  all  words  in  the    of the verb in Uzbek.
            reference must be inserted) [3].


























                                                            65
   61   62   63   64   65   66   67   68   69   70   71