Multivariate Statistics

Unit members are engaged in work in both Descriptive and Inferential Multivariate Statistics.

Descriptive Multivariate Statistics

Work in exploratory Multivariate Statistics is focusing on the following main topics.

Reducing the number of variables in the context of principal components.

(Work carried out with Jorge Orestes, Departamento de Matemática do ISA, Universidade Técnica de Lisboa; and Manuel Minhoto, Departamento de Matemática, Universidade de Évora.)

In the analysis of data sets with large numbers of variables a frequent aim is to reduce the dimensionality of the data set. A typical way of reducing the dimension of a data set is through a Principal Component Analysis (PCA). However, dimensionality reduction via PCA does not provide a real reduction of dimensionality in terms of the original variables, and the PC loadings do not reliably indicate which variables are the most relevant, in terms of preserving information. We consider an alternative approach which consists of identifying, for an arbitrary integer k, a k-variable subset which is optimal with respect to a given criterion that measures and quantifies how well each subset approximates the whole data set. For the combinatorial optimization problem introduced by each of three different criteria we developed several types of heuristics, and produced the software module subselect which can be loaded from within a session of the R statistical software package. R  is a Free Software implementation of the S Statistical Language. We aim to develop the package with additional new features, and to tackle the multi-objective problem of identifying a large set of solutions which are maximal with respect to the partial order introduced by several different criteria.

The cone of positive semi-definite matrices

Variance and correlation matrices are positive semi-definite matrices. The strucuture of the cone of positive semi-definite matrices contains information which can be helpful in several multivariate methods. A specific proposal uses the fact that the angle formed by the positive semi-definite p x p matrices with the identity matrix is a function of the variance of the relative size of its eigenvalues to suggest a method of determining the number of Principal Components that should be retained in a Principal Component Analysis: the pseudo-rank of a matrix.

Members :

Jorge Cadima


Inferential Multivariate Statistics


Members :

Carlos Coelho