Page 69 - 3-son 2018 yil
P. 69

Хорижий филология  №3, 2018 йил


                   Finite state transducers read their input    difficult.  Some  of  its  difficulties  has  been
            symbol by symbol and each time they read a          mentioned  above.  According  to  these  issues,
            symbol, they give a corresponding output and        it can be useful that if we will create a method
            move  to  a  new  state.  This  improves  the       or  program  for  this  language  which  analyze
            processing  speed  fundamentally.  Practically,     its  parts.  That,  it  should  identify  type  and
            the  processing  speed  is  independent  of  the    meanings of words in sentences. For this, we
            size of the rules [5]. A lexicon compiler is a      should  analyze  only  words  very  first.  It  is
            program  that  reads  sets  of  morphemes  and      called  morphoanalyzer.  Using  this  analyzer
            their  morphotactic  combinations  in  order  to    we can make a decision about words and their
            create  a  finite-state  transducer  of  a  lexicon   meanings,  morphological  or  other  changings
            [6].                                                in it as well.
            Sirni och (divulge)                                       So,  creating  this  analyzer  also  can  be
            Yo‗l och (open the way)                             divided several steps:
            Fol och (guess)                                           -  Identifying a stem of lexemes;
            Gul och (flourish)
                III.   Approaches to morphological                    -  Identifying  parts  of  speech  type  of
                               analysis                                  stem;
                   An inflectional form is a combination              -  Parsing all affixes added to the word
            of a stem with an inflectional affix. According              according to stem as token;
            to  Cerstin  Mahlow,  Michael  Piotrowski                 -  Identifying  types  of  all  parsed
            showed     four    approaches    to   restrict               affixes and noticing them.
            combination of affixes [7]: naive, affix, stem,
            indirection approaches.                                   These processes also does not go easily.
                   Morphological  analysis  for  machine        Because there are also many problems we can
            translation includes morphonological rules as       face  according  to  linguistical  approach.  For
            well.  For  instance  English  and  Uzbek           example, to identify a base of word we need
            languages have own rules: big=>bigger; quloq        the  database  of  all  simple  words,  which  are
            (ear)=>qulog‗im (my ear)                            not  include  any  affixes,  in  Uzbek  language.
                   In  the  early  of  90s  years  there  were   Then we should compare almost all words in
            three  types  of  morphological  analizators        database with the word. There are some idea
            based  on  three  models:  generative  model,       to  apply  our  work.  Firstly,  we  take  a  letter
            paradigmatic     model,     the     two-level       from the end of word every time and compare
            morphological model for Tatar language [8].         with  all  words  in  database.  So,  we  can  get
                   IV.     Algorithm for morphological          base cutting all affixes in the ending of word.
                  The      earliest    algorithms     for       For  example:  bolalarim  (is  not  be  found)  ->
            automatically  assigning  part-of-speech  were      bolalari (is not be found)-> bolalar (is not be
            based  on  a  two  stage  architecture  (Harris,    found)-> bolala (is not  be found)-> bolal  (is
            1962; Klein and Simmons, 1963; Greene and           not be found) -> bola (is found and finishes).
            Rubin, 1971). The first stage used a dictionary     Until we get ―bola‖ six times we compare all
            to  assign each word a list  of potential parts-    words,  which  has  less  length  than  nine
            of-speech. The second stage used large lists of     (because  ―bolalarim‖  has  nine  letters,  and
            hand-written disambiguation rules to winnow         every  step  we  can  decrease  for  one  the
            down  this  list  to  a  single  part-of-speech  for   number  of  variants  of  words),  in  database.
            each word.                                          But,  if  the  word  has  prefix,  such  as
                  It is known that machine translation is a     ―serg‘ayratlar‖,   ―noodatiylik‖,   ―beg‘am-
            huge problem for any language if there is lack      liging‖, this method does not work: serg‘ayrat
            of  resources.  But  it  can  be  considered  as  a   (is  not  be  found)  ->  serg‘ayra  (is  not  be
            very large problem  for  Uzbek language than        found) -> serg‘ayr (is not be found) -> serg‘ay
            others.  Because  as  other  Turkic  languages      (is not be found) -> serg‘a (is not be found) ->
            Uzbek  is  very  non  structured  language  and     serg‘ (is not be found) -> ser (is not be found)
            applying  some  strike  method  to  it  is  very    -> se (is not be found) -> s (is not be found


                                                            68
   64   65   66   67   68   69   70   71   72   73   74