Associated kernel discriminant analysis for multivariate mixed data

Authors

  • Some Matthieu University of Franche-Comté, LMB Besançon 16 route de Gray, 25030 Besançon cedex
  • Célestin C. Kokonendji University of Franche-Comté, LMB Besançon 16 route de Gray, 25030 Besançon cedex
  • Mona Ibrahim FCLAB/Femto-ST Belfort rue Thierry Mieg, F-90010 Belfort

DOI:

https://doi.org/10.1285/i20705948v9n2p385

Keywords:

Bandwidth matrix, non-classical kernel, profile cross-validation

Abstract

Associated kernels have been introduced to improve the classical (symmetric) continuous kernels for smoothing any functional on several kinds of supports such as bounded continuous and discrete sets. In this paper, an associated kernel for discriminant analysis with multivariate mixed variables is proposed. These variables are of three types: continuous, categorical and
count. The method consists of using a product of adapted univariate associated kernels and an estimate of the misclassication rate. A new prole version cross-validation procedure of bandwidth matrices selection is introduced for multivariate mixed data, while a classical cross-validation is used for homogeneous data sets having the same reference measures. Simulations and validation results show the relevance of the proposed method. The method has been validated on real coronary heart disease data in comparison to the classical kernel discriminant analysis.

References

Aitchison, J. and Aitken, C.G.G. (1976). Multivariate binary discrimination by the kernel

method. Biometrika 63(3):413-420.

Antoniadis, A. (1997). Wavelets in statistics: a review (with discussion), Journal of the

Italian Statistical Society SeriesB 6(2):97-144.

Bouezmarni, T. and Rombouts, J.V.K. (2010). Nonparametric density estimation for

multivariate bounded data, Journal of Statistical Planning and Inference 140(1):139-

Chen, S.X. (1999). A beta kernel estimation for density functions, Computational Statis-

tics and Data Analysis 31(2):131-145.

Chen, S.X. (2000). Probability density function estimation using gamma kernels, Annals

of the Institute of Statistical Mathematics 52(3):471-480.

Duong, T. (2004). Bandwidth Selectors for Multivariate Kernel Density Estimation.

Ph.D. Thesis Manuscript to University of Western Australia, Perth, Australia, Oc-

tober 2004.

Duong, T. (2007). ks: Kernel density estimation and kernel discriminant analysis for

multivariate data in R, Journal of Statistical Software 21(7):1-16.

Gosh, A.K. and Chaudhury, P. (2004). Optimal smoothing in kernel analysis discrimi-

nant, Statistica Sinica 14(2):457-483.

Gosh, A.K. and Hall, P. (2008). On error-rate estimation in nonparametric classiffication,

Statistica Sinica 18:1081{1100.

Gu, C. (1993). Smoothing spline density estimation: A dimensionless automatic algo-

rithm. Journal of the American Statistical Association 88(422):495-504.

Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning,

Springer, New York.

Hall, P. and Wand, M.P. (1988). On nonparametric discrimination using density differ-

ences. Biometrika 75(3):541-547.

Halvorsen, K. (2015). ElemStatLearn: Data sets, functions and examples from

the book: The Elements of Statistical Learning, Data Mining, Inference, and

Prediction" by Trevor Hastie, Robert Tibshirani and Jerome Friedman, URL

http://cran.r-project.org/web/packages/ElemStatLearn/index.html.

Hayfield, T. and Racine, J.S. (2007). Nonparametric econometrics: the np package,

Journal of Statistical Software 27(5):1-32.

Igarashi, G. and Kakizawa, Y. (2015). Bias correction for some asymmetric kernel esti-

mators, Journal of Statistical Planning and Inference 159:37-63.

Downloads

Published

14-10-2016