Rencontres d'Astrostatistique 2014

sciencesconf.org:astrostat2014:48378

Collaborative Sliced Inverse Regression

Alessandro Chiancone 2, 1, @ , Stéphane Girard 1 , Jocelyn Chanussot 2, @

2 : Grenoble Images Parole Signal Automatique (GIPSA-lab) - Site web

Institut Polytechnique de Grenoble - Grenoble Institute of Technology

Gipsa-lab - 961 rue de la Houille Blanche - BP 46 - 38402 Grenoble cedex - France

1 : MISTIS (INRIA Grenoble Rhône-Alpes / LJK Laboratoire Jean Kuntzmann) - Site web

Laboratoire Jean Kuntzmann, INRIA

Inria Grenoble - Rhône-Alpes 655 avenue de l'Europe - Montbonnot 38334 Saint Ismier Cedex - France

In multidimensional data analysis, one has to deal with a dataset X made of n points in dimension p. When n and p are simultaneously large, classical statistical analysis methods and models fail. Supervised and unsupervised dimensionality reduction techniques are widely used to preprocess high dimensional data retaining the information useful to solve the original problem.

In regression context Sliced Inverse Regression [1] has proven to achieve good results retrieving a base of the so called effective dimension reduction (e.d.r.) space i.e. the smallest space containing the information needed to correctly regress the function.

Recently, many papers focused on the complex structure of real data showing that often the data is organized in subspaces. Kuentz & Saracco (2009) proposed to clusterize X and use SIR in each cluster to better fit the so called linearity condition.

Our hypothesis is that the e.d.r. space is not unique all over the data and that the different clusters can be assigned to different e.d.r. spaces. We introduce a novel technique to identify the number of e.d.r. spaces based on a weighted distance between the different spaces. First we clusterize the data (in our simulation study we used the standard k-means) then we apply SIR independently in each cluster. A greedy merging algorithm is proposed to assign each cluster to its e.d.r space taking into account the size of the cluster on which SIR is performed.

Our approach is illustrated on simulated data from a Gaussian mixture model.

This work is founded by LabEx Persyval

1 Li, K.C. (1991). Sliced inverse regression for dimension reduction (with discussion). Journal of the American Statistical Association, 86, 316-342.

2 Kuentz, V., & Saracco, J. (2010). Cluster-based sliced inverse regression. Journal of the Korean Statistical Society, 39(2), 251-267.

Type :

Présentation

Autre

Personnes connectées : 1