Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Bertsimas, D, Cory-Wright, R and Pauphilet, J (2022) Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality. Journal of Machine Learning Research, 23 (13). pp. 1-35. ISSN 1532-4435

Official URL: https://www.jmlr.org/papers/v23/20-1188.html

Abstract

Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which are linear combinations of a small subset of the original features. Existing approaches cannot supply certifiably optimal principal components with more than p = 100s of variables. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a cutting-plane method which solves the problem to certifiable optimality at the scale of selecting k = 5 covariates from p= 300 variables, and provides small bound gaps at a larger scale. We also propose a convex relaxation and greedy rounding scheme that provides bound gaps of 1 - 2% in practice within minutes for p = 100s or hours for p= 1; 000s and is therefore a viable alternative to the exact method at scale. Using real-world financial and medical data sets, we illustrate our approach's ability to derive interpretable principal components tractably at scale.

More Details

Item Type:	Article
Subject Areas:	Management Science and Operations
Additional Information:	© 2022 Dimitris Bertsimas, Ryan Cory-Wright and Jean Pauphilet. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/ Attribution requirements are provided at http://jmlr.org/papers/v23/20-1188.html.
Date Deposited:	07 Feb 2022 13:32
Date of first compliant deposit:	04 Feb 2022
Subjects:	Management science
Last Modified:	04 Nov 2022 07:16
URI:	https://lbsresearch.london.edu/id/eprint/2223

Download

Accepted Version - Text

Download (540kB)

Available under License

Statistics

Downloads from LBS Research Online

View details

Actions (login required)

Edit Item