Page 69 - 3-son 2018 yil
P. 69
Хорижий филология №3, 2018 йил
Finite state transducers read their input difficult. Some of its difficulties has been
symbol by symbol and each time they read a mentioned above. According to these issues,
symbol, they give a corresponding output and it can be useful that if we will create a method
move to a new state. This improves the or program for this language which analyze
processing speed fundamentally. Practically, its parts. That, it should identify type and
the processing speed is independent of the meanings of words in sentences. For this, we
size of the rules [5]. A lexicon compiler is a should analyze only words very first. It is
program that reads sets of morphemes and called morphoanalyzer. Using this analyzer
their morphotactic combinations in order to we can make a decision about words and their
create a finite-state transducer of a lexicon meanings, morphological or other changings
[6]. in it as well.
Sirni och (divulge) So, creating this analyzer also can be
Yo‗l och (open the way) divided several steps:
Fol och (guess) - Identifying a stem of lexemes;
Gul och (flourish)
III. Approaches to morphological - Identifying parts of speech type of
analysis stem;
An inflectional form is a combination - Parsing all affixes added to the word
of a stem with an inflectional affix. According according to stem as token;
to Cerstin Mahlow, Michael Piotrowski - Identifying types of all parsed
showed four approaches to restrict affixes and noticing them.
combination of affixes [7]: naive, affix, stem,
indirection approaches. These processes also does not go easily.
Morphological analysis for machine Because there are also many problems we can
translation includes morphonological rules as face according to linguistical approach. For
well. For instance English and Uzbek example, to identify a base of word we need
languages have own rules: big=>bigger; quloq the database of all simple words, which are
(ear)=>qulog‗im (my ear) not include any affixes, in Uzbek language.
In the early of 90s years there were Then we should compare almost all words in
three types of morphological analizators database with the word. There are some idea
based on three models: generative model, to apply our work. Firstly, we take a letter
paradigmatic model, the two-level from the end of word every time and compare
morphological model for Tatar language [8]. with all words in database. So, we can get
IV. Algorithm for morphological base cutting all affixes in the ending of word.
The earliest algorithms for For example: bolalarim (is not be found) ->
automatically assigning part-of-speech were bolalari (is not be found)-> bolalar (is not be
based on a two stage architecture (Harris, found)-> bolala (is not be found)-> bolal (is
1962; Klein and Simmons, 1963; Greene and not be found) -> bola (is found and finishes).
Rubin, 1971). The first stage used a dictionary Until we get ―bola‖ six times we compare all
to assign each word a list of potential parts- words, which has less length than nine
of-speech. The second stage used large lists of (because ―bolalarim‖ has nine letters, and
hand-written disambiguation rules to winnow every step we can decrease for one the
down this list to a single part-of-speech for number of variants of words), in database.
each word. But, if the word has prefix, such as
It is known that machine translation is a ―serg‘ayratlar‖, ―noodatiylik‖, ―beg‘am-
huge problem for any language if there is lack liging‖, this method does not work: serg‘ayrat
of resources. But it can be considered as a (is not be found) -> serg‘ayra (is not be
very large problem for Uzbek language than found) -> serg‘ayr (is not be found) -> serg‘ay
others. Because as other Turkic languages (is not be found) -> serg‘a (is not be found) ->
Uzbek is very non structured language and serg‘ (is not be found) -> ser (is not be found)
applying some strike method to it is very -> se (is not be found) -> s (is not be found
68