Bertsimas, D, Cory-Wright, R and Pauphilet, J (2022) Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality. Journal of Machine Learning Research, 23 (13). pp. 1-35. ISSN 1532-4435
Abstract
Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which are linear combinations of a small subset of the original features. Existing approaches cannot supply certifiably optimal principal components with more than p = 100s of variables. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a cutting-plane method which solves the problem to certifiable optimality at the scale of selecting k = 5 covariates from p= 300 variables, and provides small bound gaps at a larger scale. We also propose a convex relaxation and greedy rounding scheme that provides bound gaps of 1 - 2% in practice within minutes for p = 100s or hours for p= 1; 000s and is therefore a viable alternative to the exact method at scale. Using real-world financial and medical data sets, we illustrate our approach's ability to derive interpretable principal components tractably at scale.
More Details
Item Type: | Article |
---|---|
Subject Areas: | Management Science and Operations |
Additional Information: |
© 2022 Dimitris Bertsimas, Ryan Cory-Wright and Jean Pauphilet.
|
Date Deposited: | 07 Feb 2022 13:32 |
Date of first compliant deposit: | 04 Feb 2022 |
Subjects: | Management science |
Last Modified: | 22 Sep 2024 07:27 |
URI: | https://lbsresearch.london.edu/id/eprint/2223 |