Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Bertsimas, D, Cory-Wright, R and Pauphilet, J (2022) Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality. Journal of Machine Learning Research, 23 (13). pp. 1-35. ISSN 1532-4435 OPEN ACCESS


Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which are linear combinations of a small subset of the original features. Existing approaches cannot supply certifiably optimal principal components with more than p = 100s of variables. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a cutting-plane method which solves the problem to certifiable optimality at the scale of selecting k = 5 covariates from p= 300 variables, and provides small bound gaps at a larger scale. We also propose a convex relaxation and greedy rounding scheme that provides bound gaps of 1 - 2% in practice within minutes for p = 100s or hours for p= 1; 000s and is therefore a viable alternative to the exact method at scale. Using real-world financial and medical data sets, we illustrate our approach's ability to derive interpretable principal components tractably at scale.

More Details

Item Type: Article
Subject Areas: Management Science and Operations
Additional Information:

© 2022 Dimitris Bertsimas, Ryan Cory-Wright and Jean Pauphilet.
License: CC-BY 4.0, see Attribution requirements are provided at

Date Deposited: 07 Feb 2022 13:32
Date of first compliant deposit: 04 Feb 2022
Subjects: Management science
Last Modified: 04 Nov 2022 07:16

Export and Share


Accepted Version - Text
  • Available under License


Downloads from LBS Research Online

View details

Actions (login required)

Edit Item Edit Item