Entrar | Contactos | Dicionário | FLiP.pt | LegiX.pt | Blogue | Loja

terça-feira, 1 de março de 2011

Priberam Machine Learning Lunch Seminars

Priberam Machine Learning Lunch Seminar
Speaker: Artur Ferreira (IT/ISEL)
Venue: IST Alameda, Sala PA2 (Edifício de Pós-Graduação)
Date: Tuesday, March 1st, 2011
Time: 13:00
Lunch will be provided

Title: Unsupervised feature discretization and selection for sparse data


In many applications, we deal with high dimensional datasets with sparse data (many features
have zero value with high probability). For instance, in text classification and information
retrieval problems, we have large collections of documents. Each text is usually represented by
a bag-of-words or similar representation, with a large number of features (terms). Many of
these features may be irrelevant (or even detrimental) for the learning tasks. This excessive
number of features carries the problem of memory usage in order to represent and deal with
these collections, clearly showing the need for adequate methods for feature representation,
reduction, and selection, to both improve the classification accuracy and the memory
requirements for the storage of these datasets.
This talk focuses on techniques for unsupervised Feature Discretization (FD) and Feature
Selection (FS). The proposed FD technique uses the Lloyd-Max algorithm along with a new
criterion for FS based on the discretized features. The FS methods rely on the use of dispersion
measures to compute feature relevance. The recent topic of compressed learning (CL), i.e.,
learning in a domain of reduced dimensionality obtained by random projections (RP) is
explored under the framework of feature reduction. We show some experimental results on
standard datasets.


Bio: Artur Ferreira is adjunct professor at ISEL (Instituto Superior de Engenharia de Lisboa) and a
PhD student of Electrical and Computer Engineering at IST-IT (Instituto Superior Técnico –
Instituto de Telecomunicações), under the supervision of prof. Mário Figueiredo. He holds a
MSc on Electrical and Computer Engineering by IST. His main research interests are data
compression, pattern recognition and machine learning

Sem comentários:

Enviar um comentário