ClustOfVar-based approach for unsupervised learning: Reading of synthetic variables with sociological data
DOI:
https://doi.org/10.1285/i20705948v8n2p170Keywords:
environment, variable clustering, ClustOfVar, synthetic variables, typology of farmersAbstract
This paper proposes an original data mining method for unsupervised learning, replacing traditional factor analysis with a system of variable clustering. Clustering of variables aims to group together variables that are strongly related to each other, i.e. containing the same information. We recently proposed the ClustOfVar method, specifically devoted to variable clustering, regardless of whether the variables are numeric or categorical in nature. It simultaneously provides homogeneous clusters of variables and their corresponding synthetic variables that can be read as a kind of gradient. In this algorithm, the homogeneity criterion of a cluster is defined by the squared Pearson correlation for the numeric variables and by the correlation ratio for the categorical variables. This method was tested on categorical data relating to French farmers and their perception of the environment. The use of synthetic variables provided us with an original approach of identifying the way farmers reconfigured the questions put to them.References
Abdallah, H. and Saporta, G. (1998). Classification d’un ensemble de variables qualitatives. Revue de Statistique Appliquée, 46(4):5–26.
Arabie, P. and Hubert, L. (1994). Cluster analysis in marketing research. In Bagozzi, R. P., editor, Advanced methods of marketing research, pages 160–189. Blackwell, Cambridge, MA.
Burton, R. J. F. (2014). The influence of farmer demographic characteristics on environmental behaviour: A review. Journal of Environmental Management, 135:19–26.
Candau, J., Deuffic, P., Ginelli, L., Lewis, N., and Lyser, S. (2005). La prise en compte de l’environnement par les agriculteurs. Résultats d’enquête. Rapport d’étude, Cemagref.
Charrad, M. and Ben Ahmed, M. (2011). Simultaneous Clustering: A Survey. In Pattern Recognition and Machine Intelligence. Springer Berlin / Heidelberg.
Chavent, M., Kuentz, V., Liquet, B., and Saracco, J. (2011). ClustOfVar: An R Package for the Clustering of Variables. In The R User Conference.
Chavent, M., Kuentz-Simonet, V., Liquet, B., and Saracco, J. (2012a). ClustOfVar: An R Package for the Clustering of Variables. Journal of Statistical Software, 50(13):1–16.
Chavent, M., Kuentz-Simonet, V., and Saracco, J. (2012b). Orthogonal rotation in PCAMIX. Advances in Data Analysis and Classification.
Dhillon, I., Marcotte, E., and Roshan, U. (2003). Diametrical Clustering for Identifying Anticorrelated Gene Clusters. Bioinformatics, 19(13):1612–1619.
Kiers, H. (1991). Simple structure in component analysis techniques for mixtures of qualitative and quantitative variables. Psychometrika, 56(2):197–212.
Lerman, I. (1990). Foundations of the likelihood linkage analysis classification method. Applied Stochastics Models and Data Analysis, 7(1):63–76.
Lerman, I. (1993). Likelihood linkage analysis classification method : An example treated by hand. Biochimie, 75(5):379–397.
SAS Institute Inc. (2013). The varclus procedure. In SAS/STAT R 13.1 User’s Guide. SAS Institute Inc., Cary, NC.
Vichi, M. and Kiers, H. A. L. (2001). Factorial k-means analysis for two-way data. Computational Statistics & Data Analysis, 37(1):49–64.
Vichi, M. and Saporta, G. (2009). Clustering and Disjoint Principal Component Analysis. Computational Statistics & Data Analysis, 53(8):3194–3208.
Vigneau, E. and Chen, M. (2015). ClustVarLV: Clustering of Variables Around Latent Variables. R package version 1.3.2.
Vigneau, E. and Qannari, E. (2003). Clustering of variables around latent components. Communications in statistics Simulation and Computation, 32(4):1131–1150.
Downloads
Published
Issue
Section
License
Authors who publish with EJASA agree to the Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.
