Multivariate Statistics
Unit
members are engaged in work in both Descriptive and Inferential
Multivariate Statistics.
Descriptive Multivariate Statistics
Work in
exploratory Multivariate Statistics is focusing on the following main
topics.
Reducing the number of variables in the context of
principal
components.
(Work carried out with Jorge Orestes,
Departamento de Matemática do ISA, Universidade Técnica de Lisboa; and
Manuel Minhoto, Departamento de Matemática, Universidade de Évora.)
In the analysis of data sets with large numbers of variables a frequent
aim is to reduce the dimensionality of the data set. A typical way of
reducing the dimension of a data set is through a Principal Component
Analysis (PCA). However, dimensionality reduction via PCA does not
provide a real reduction of dimensionality in terms of the original
variables, and the PC loadings do not reliably indicate which variables
are the
most relevant, in terms of preserving information. We consider an
alternative approach which consists of identifying, for an arbitrary
integer k, a k-variable subset which is optimal
with respect to a given criterion that measures and quantifies how well
each subset approximates the whole data set. For the combinatorial
optimization problem introduced by each of three different criteria we
developed several types of heuristics, and produced the software module
subselect which can be loaded
from within a session of the R statistical software package. R is a Free Software
implementation of the S Statistical Language. We aim to develop the
package with additional
new features, and to tackle the multi-objective problem of identifying
a large set of solutions which are maximal with respect to the partial
order
introduced by several different criteria.
The cone of positive semi-definite matrices
Variance and correlation matrices are positive
semi-definite matrices. The strucuture of the cone of positive
semi-definite matrices contains information which can be helpful in
several multivariate methods. A specific proposal uses the fact that
the angle formed by the positive semi-definite p x p matrices
with the identity matrix is a function of the variance of the relative
size of its eigenvalues to suggest a method of determining the number
of Principal Components that should be retained in a Principal
Component Analysis: the pseudo-rank of a matrix.
Members
:
Jorge Cadima
Inferential Multivariate Statistics
Members
:
Carlos Coelho