I= DEVELOPMENT OF ALGORITHMS AND COMPUTATIONAL GRAMMAR FOR URDU
Pakistan Research Repository Home
 

Title of Thesis
DEVELOPMENT OF ALGORITHMS AND COMPUTATIONAL GRAMMAR FOR URDU

Author(s)
Syed Muhammad Jafar Rizvi
Institute/University/Department Details
Department of Computer & Information Sciences/ Pakistan Institute of Engineering and Applied Sciences Nilore Islamabad
Session
2007
Subject
Computer & Information Sciences
Number of Pages
242
Keywords (Extracted from title, table of contents and abstract of thesis)
algorithms, computational grammar, urdu, grammar modeling, lexical functional grammar, head-driven phrase structure grammar, morphology, syntax, verb morphology, closed-word-classes, roman script, language processing tasks, machine translation, text summarization, grammar checker, information retrieval, machine translation architectures, urdu noun, urdu lexion

Abstract
This work presents the linguistics-based grammar modeling of Urdu language under the framework of Lexical Functional Grammar (LFG) and at places under Head-driven Phrase Structure Grammar (HPSG). The grammar mode ling has been done by considering two interlinked parts: the morphology and the syntax.

Urdu has a rich verb morphology comprising 60 basic verb forms categorized into infinitive, perfective, repetitive, subjunctive and imperative forms. The 60 forms are not enough to represent all the features of Urdu verbs. Various verb features are composed when verb auxiliaries and/or light verbs combine with these verb forms. Linguistically, verb auxiliaries are needed to combine at the syntactic level. However, this work shows that the grammar model is simplified and the complex agreement requirements can be avoided if auxiliaries are lumped with verb forms at the lexical level. The work proposes the analysis of perfective, progressive, repetitive and inceptive aspects as well as the analysis of declarative, permissive, prohibitive, imperative, capacitive, suggestive, compulsive, presumptive and subjunctive moods. The structure of a passive is analyzed by assuming a default argument.

This work, based on difference in grammar modeling and conceptualization, classifies Urdu case markers and post-positions into noun forms, core case markers, functional case markers, possession markers and post-positions. Noun forms are modeled morphologically using lexical transducers, possession markers require two noun phrases, post-position appear as adjuncts, while core and functional case markers appear in the argument structure of verbs.

To classify core and functional case markers the use of semantic features has been proposed. The semantic features based classification particularly demonstrated better taxonomy of different 'instrumental cases' in Urdu. This classification of 'instrumental case' exposed the presence of 'indirect subjects' for Urdu causative verbs which further suggested that some causative verbs are tetravalent because the argument structure of these verbs has four arguments.

The study of case-markers reveals that the agreement between a noun and a case marker is difficult to handle. It is argued that the head of phrase should be a noun because the resultant is a noun phrase, but features of the case marker also transfer to the resultant phrase, therefore, a modification to head-feature rule is proposed. The same argument also helped to reaffirm that Urdu case markers are different from Urdu possession markers, which require a different rule needing two noun phrases as a specifier and a complement to make a resultant noun phrase. The adjective-noun agreement is also modeled on the same grounds for their gender and number agreement.

The work proposes an algorithm for the parsing Urdu sentences based on Urdu closed-word-classes. This helps in identifying chunks based on the linguistic characteristics of the word classes. The rule selection is simplified by providing a guess of the word class that may appear before or after it.

The work also presents a novel roman script for Urdu language for transliteration, which is not only phonetic like other roman scripts, but also makes possible to transfer text in this roman script to or from Urdu script, in both directions, using a computer program.

This thesis, therefore, presents novel ideas for the computational grammar of Urdu, which can be utilized in various natural language processing tasks, such as machine translation, text summarization, grammar checker, information retrieval, etc.

Download Full Thesis
3640.41 KB
S. No. Chapter Title of the Chapters Page Size (KB)
1 0 Contents
207.57 KB
2 1 Research Objectives 1
73.4 KB
  1.1 Objectives Statement 1
  1.2 Domain Of Investigation 1
  1.3 Organization Of Thesis 2
3 2 Introduction To Machine Translation 6
268.47 KB
  2.1 Machine Translation( MT) 6
  2.2 Challenges For Machine Translation 7
  2.3 Historic Landmarks 15
  2.4 Machine Translation Architectures 16
  2.5 Machine Translation Phases 18
  2.6 Machine Translation Paradigms 19
  2.7 MTt Route Followed In This Thesis 22
4 3 Grammar Modeling 24
339.11 KB
  3.1 Lexical Function Grammar (LFG ) 28
  3.2 Transfer Between English Urdu F-Structures 40
  3.3 Free ‚€˜ Sov ‚€™ Phrase Order In Urdu 41
  3.4 Head Driven Phrase Structure Grammar( HPSG) 43
  3.5 Selection Of Grammar Theory 50
5 4 Urdu Verb Characteristics And Morphology 52
566.62 KB
  4.1 Verb Transitivity And Valency 53
  4.2 Urdu Verb Morphology 55
  4.3 Verb Forms 55
  4.4 Verb Morphology Representation 63
  4.5 Tense 67
  4.6 Aspect 69
  4.7 Mood 69
  4.8 Attribute-Values For Urdu Verbs 70
6 5 Urdu Noun Characteristics And Morphology 71
202.14 KB
  5.1 Urdu Noun Characteristics 71
  5.2 Noun Morphology 76
  5.3 Adjective Morphology 78
  5.4 Attribute-Value Tags For Urdu Nouns 78
7 6 Algorithms For Lexicon Implementation 80
108.94 KB
  6.1 Introduction 80
  6.2 Storage Of Urdu Lexion 80
  6.3 Storage In A Hash Table 81
  6.4 Storage Using Lexical Transducer 83
  6.5 Lexical Transducers 87
  6.6 Conclusion 88
8 7 Modeling Urdu Nominal Syntax By Identifying Case Markers And Postposition 90
566.66 KB
  7.1 Classification Of Case Markers And Postposition 92
  7.2 Urdu Case Marking Phrase Structure 97
  7.3 Analysis For Urdu Case Markers 100
  7.4 Classification Of Cases Marked With ‚€˜Sey‚€™ 112
  7.5 Possession Markers 122
  7.6 Argument Structure Of Causatives Verbs
  7.7 Conclusions 134
9 8 Modeling Urdu Verbal Syntax By Identifying Tense, Aspect And Mood Features 136
733.08 KB
  8.1 Urdu Verb Agreement 136
  8.2 Verb Aspect In Urdu 147
  8.3 Verb Mood In Urdu 153
  8.4 Verbal Coordination In Urdu
10 9 Urdu Parsing By Chunking Based On Close Word Classes Using Ordered Context Free Grammar 173
303.83 KB
  9.1 Ordered Context Free Grammar 174
  9.2 Tokenization Co 176
  9.3 Part Of Speech ( PoS ) Tagging 177
  9.4 Chunking 185
  9.5 Algorithm For Parsing Through Chunking 187
  9.6 Parsing By Chunking Illustrative Examples 187
  9.7 Results And Analysis 191
  9.8 Conclusions 192
11 10 Conclusions 192
68.91 KB
  10.1 Summary And Conclusions 194
  10.2 Future Directions 197
12 11 Appendix 199
509.87 KB
  11.1 References 236
  11.2 Papers Published During The Research 240
  11.3 Index 242