On 18 June 2015, at 13:00 in TUT room ICT-638

Non-parametric Bayesian models for computational morphology


Kairit Sirts

defends her PhD thesis

Supervisors: Prof. Leo Võhandu (TUT) and Sharon Goldwater (Edinburgh Univ., UK)
Opponents: Mikko Kurimo (Aalto Univ., Finland) and Kristina Toutanova (Microsoft Research)

Abstract

Morphology is a complex phenomenon that encompasses several different computational tasks, such as morphological segmentation or analysis and (morpho)syntactic clustering. We hypothesize that in an unsupervised or weakly-supervised learning setting, using joint learning models that address several aspects of the complex morphological processes simultaneously will be beneficial because during joint learning, different aspects of the same process will help to disambiguate each other.

In this dissertation, we develop three unsupervised or weakly-supervised models of computational morphology that employ joint learning in different ways. We adopt non-parametric Bayesian modeling, which provides a flexible framework for learning with both observed and latent variables and additionally, provides suitable prior distributions for linguistic data. In addition to empirically demonstrating the performance of our models, we seek to show that 1) joint modeling provides benefits over non-joint modeling and 2) modeling some latent aspects of the process (in addition to those that we are directly interested in) provides further advantages.

In general, most of our experimental results support the hypothesis about the benefits of the joint modeling. However, some experimental results (notably obtained with the first and the last model) are not as good as expected. Analyzing these results leads to an understanding that our assumptions about the relationships between morphemic suffixes and syntactic tags may be too simplistic for the morphologically rich languages, motivating future work to capture these relationships more effectively.

Viimati muutnud: 2015/06/08 16:29