segunda-feira, 30 de maio de 2011

Priberam Machine Learning Lunch Seminar

Speaker: João Graça (L2F @INESC-ID)
Venue: IST Alameda, Sala PA2 (Pavilhão de Pós-Graduação)
Date: Tuesday, May 31st, 2011
Time: 13:00
Lunch will be provided

Title: Rich Prior Knowledge in Learning for Natural Language Processing


We possess a wealth of prior knowledge about most prediction problems,
and particularly so for many of the fundamental tasks in natural
language processing. Unfortunately, it is often difficult to make
use of this type of information during learning, as it typically does
not come in the form of labeled examples, may be difficult to encode
as a prior on parameters in a Bayesian setting, and may be impossible
to incorporate into a tractable model. Instead, we usually have prior
knowledge about the values of output variables. For example, linguistic
knowledge or an out-of-domain parser may provide the locations of
likely syntactic dependencies for grammar induction. Motivated by
the prospect of being able to naturally leverage such knowledge, four
different groups have recently developed similar, general frameworks
for expressing and learning with side information about output variables.
These frameworks are Constraint-Driven Learning (UIUC), Posterior
Regularization (UPenn), Generalized Expectation Criteria (UMass Amherst),
and Learning from Measurements (UC Berkley).

This tutorial describes how to encode side information about output
variables, and how to leverage this encoding and an unannotated
corpus during learning. We survey the different frameworks, explaining
how they are connected and the trade-offs between them. We also survey
several applications that have been explored in the literature,
including applications to grammar and part-of-speech induction, word
alignment, information extraction, text classification, and multi-view
learning. Prior knowledge used in these applications ranges from
structural information that cannot be efficiently encoded in the model,
to knowledge about the approximate expectations of some features, to
knowledge of some incomplete and noisy labellings. These applications
also address several different problem settings, including unsupervised,
lightly supervised, and semi-supervised learning, and utilize both
generative and discriminative models. The diversity of tasks, types of
prior knowledge, and problem settings explored demonstrate the generality
of these approaches, and suggest that they will become an important tool
for researchers in natural language processing.

The tutorial will provide the audience with the theoretical background to
understand why these methods have been so effective, as well as practical
guidance on how to apply them. Specifically, we discuss issues that come
up in implementation, and describe a toolkit that provides "out-of-the-box"
support for the applications described in the tutorial, and is extensible
to other applications and new types of prior knowledge.


Bio: Joao Graca is a post doctoral researcher at the L2F Inesc-ID.
He obtained his PhD in Computer Science Engineering at Instituto Superior Tecnico, Technical University of Lisbon, where he was advised jointly by Luisa Coheur, Fernando Pereira and Ben Taskar. His main research interest are Machine Learning and Natural Language Processing. Currently his research focus on unsupervised learning with high level supervision in the form of domain specific prior knowledge, and on the utility of unsupervised methods for real world applications.

Palavras da semana

As palavras do dia da semana de 23-05 a 29-05 de 2011: franchão, gamofobia, heliporto, ísquion, jiboiar, krill e lumaquela.

quinta-feira, 26 de maio de 2011

Agenda: Feira do livro do Porto 2011

Tem início hoje a Feira do Livro do Porto, que decorre na Avenida dos Aliados até dia 12 de Junho.

A 81.ª edição deste certame literário é organizada pela Associação Portuguesa de Editores e Livreiros (APEL) e prevê a realização de debates, apresentações, lançamentos e sessões de autógrafos.

Tal como na passada Feira do Livro de Lisboa, na agenda de debates não há qualquer referência à forma como o Acordo Ortográfico está ou não a ser adoptado pelas editoras.

segunda-feira, 23 de maio de 2011

segunda-feira, 16 de maio de 2011

Priberam Machine Learning Lunch Seminars

Speaker: José Santos (Microsoft)
Venue: IST Alameda, Sala EA2 (Torre Norte)
Date: Tuesday, May 17th, 2011
Time: 13:00
Lunch will be provided

Title: Inductive Logic Programming applied to Bioinformatics


Inductive Logic Programming (ILP) is a Machine Learning approach with foundations in Logic Programming. The problem specification and the models discovered by ILP systems are both represented as Prolog programs allowing for great expressiveness and flexibility. However, this flexibility comes at a high computational cost and ILP systems are known for their difficulty in scaling-up. Constructing and evaluating complex concepts are two of the main problems that prevent ILP systems from tackling many of the most interesting learning problems. Large concepts cannot be constructed or evaluated simply by parallelizing existing top-down search algorithms or improving the underlying Prolog engine. Novel search strategies and cover algorithms are needed. The main focus of this talk is on how to efficiently construct and evaluate such complex hypotheses in an ILP setting. Namely, we will present an efficient theta-subsumption algorithm that improves over Prolog's SLD-resolution by several orders of magnitude. We will also show how a new bottom-up search strategy coupled with this efficient subsumption algorithm led to the discovery of a better model for a protein-binding application problem.


Bio: José Santos has a Ph.D. degree in Computer Science (2010) from Imperial College London. In the Ph.D., he worked on the theory and implementation of Inductive Logic Programming (ILP) systems. ILP is a first-order logic form of Machine Learning. José is now a post-doctoral fellow at the Microsoft Language Development Center where he is working on improving Bing’s query rewriting mechanisms so that the Bing backend may return more relevant documents. José also holds a Licenciatura in Informatics Engineering (2004 FCT-UNL), an MSc in Artificial Intelligence (2006 FCT-UNL) and an MSc in BioInformatics (2007 Imperial College). After graduating in 2004 José worked one year at Novabase Business Intelligence.

Palavras da semana

As palavras do dia da semana de 09-05 a 15-05 de 2011: pulcrícomo, quiçá, rágade, sucenturiado, tramolhada, uredo e ventaneira.

quinta-feira, 5 de maio de 2011

Dia da Língua Portuguesa

Para celebrar o papel da língua portuguesa, «um vínculo histórico e um património comum resultantes de uma convivência multissecular que deve ser valorizada», haverá eventos em Angola, Brasil, Cabo Verde, Guiné-Bissau, Moçambique, Portugal, São Tomé e Príncipe e Timor-Leste. A lista dos principais eventos, por país, pode ser consultada aqui.

Em Lisboa, destaque para a exposição de livros de autores em língua portuguesa, para a leitura de textos de autores da CPLP e para a voz de Manuel Freire a cantar poetas da CPLP, no Instituto Camões. No Auditório da Associação Portuguesa de Editores e Livreiros (APEL) da Feira do Livro de Lisboa, às 19h30, haverá um debate sobre a internacionalização da língua e cultura lusófonas, com a presença de Ana Paula Laborinho (presidente do Instituto Camões), Domingos Simões Pereira (secretário executivo da CPLP) e Paulo Teixeira Pinto (presidente da APEL).