Priberam Machine Learning Lunch Seminar
Speaker: João Graça (L2F, INESC-ID)
Venue: IST Alameda, Sala PA2 (Edifício de Pós-Graduação)
Date: Tuesday, June 22th, 2010
Time: 13:00
Lunch will be provided
Title: "Posterior Regularization Framework: Learning Tractable Models
with Intractable Constraints"
Abstract:
Unsupervised Learning of probabilistic structured models presents a
fundamental trade- off between richness of captured constraints and
correlations versus efficiency and tractability of inference. In this
thesis, we propose a new learning framework called Posterior Regulariza-
tion that incorporates side-information into unsupervised estimation in
the form of constraints on the model’s posteriors. The underlying model
remains unchanged, but the learning method changes. During learning, our
method is similar to the EM algorithm, but we solve a problem similar to
Maximum Entropy inside the E-Step to enforce the constraints. We apply
the PR framework to two different large scale tasks: Statistical Word
Alignments and Unsupervised Part of Speech Induction. In the former, we
incorporate two constraints: bijectivity and symme- try. Training using
these constraints produces a significant boost in performance as
measured by both precision and recall against manually annotated
alignments for six language pairs. In the latter we enforce sparsity on
the word tag distribution which is overestimated using the default
training method. Experiments on six languages achieve dramatic
improvements over state-of-the-art results.
Bio: I am currently a 4th year PhD student (with MSc degree) in Computer
Science Engineering at Instituto Superior Técnico, Technical University
of Lisbon and a visiting student at University of Pennsylvania. My
advisors are Luisa Coheur, Fernando Pereira and Ben Taskar.
My main research interests are Machine Learning and Natural Language
Processing. My current focus in on unsupervised learning with high level
supervision in the form of constraints.
I am a proud member of the Spoken Language Systems Lab (L2F) in Lisbon
and of the Penn Research in Machine Learning (PRiML).