classiq-library icon indicating copy to clipboard operation
classiq-library copied to clipboard

Feature selection using Quantum Unconstrained Binary Optimization (QUBO) - Paper Implementation Project

Open lakshmi-yendapalli opened this issue 10 months ago • 7 comments

Our team ClassiQ-FS (Alex, Zach, Lakshmi) would like to implement the paper Feature selection on quantum computers. Our proposal is outlined below.

Abstract

Feature selection is a critical step in Machine Learning and data analysis aimed at reducing dimensionality and allowing us to get the most value out of our data by focussing on the most informative features. As datasets grow increasingly complex, Quantum Computing offers a promising approach to explore high-dimensional solution spaces more efficiently than classical approaches. Feature selection on quantum computers by Mücke et al. proposes a novel feature selection algorithm based on a Quadratic Unconstrained Binary Optimization (QUBO) problem which allows us to select a specified number of features based on their importance and redundancy, yielding higher quality solutions. This approach can be implemented on a gate based quantum computer as well as a quantum annealer.

Project Overview

Implement feature selection using Quadratic Unconstrained Binary Optimization (QUBO) as described in Section 2.3 of the referenced paper. The paper proposes a feature selection algorithm using a QUBO problem which can be solved with quantum computing. Our QUBO objective function is of the form: Image

Q(α) is the QUBO matrix whose elements are: Image

where:

  • δᵢⱼ is the Kronecker-delta function
  • I is the importance vector
  • R is the redundancy matrix
    The importance vector represents the mutual information of the features with the class label, while the redundancy matrix represents the mutual information of the features amongst themselves.

Image

We can encode our QUBO problem with the Hamiltonian: Image where σ_i is the Pauli-z operator acting on qubit i, and coefficients a, b, and c found from the transformation of x features to (1 - σ_i)/2.

Thus, the QUBO problem can be transformed into the problem of finding the minimum eigenstate or ground state of H. To implement this using the Classiq library, we will use the Variational Quantum Eigensolver (VQE) algorithm which will rely on a hybrid quantum-classical computational approach to minimize the expected energy of H.

Motivating examples

Feature selection is a fundamental problem in machine learning in order to reduce high dimensional datasets, improve model performance, reduce computational cost and enhance interpretability. Classical feature selection techniques including filter methods and wrapper methods often struggle with high dimensional datasets. Quantum algorithms offer the potential to navigate these high dimensional search spaces more effectively. Feature selection can be formulated as a combinatorial optimization problem, which aligns very well with Quantum approaches such as VQE.

As an example, consider the Dexter dataset, a benchmark dataset from the NIPS Feature Selection challenge designed for high-dimensional text classification. Each document in the data is represented by 20,000 features, however, only a small subset of these 20,000 features is truly informative for classification. Thus, feature selection is a critical step to process this data. However, classical methods such as RFE, LASSO or tree-based features would struggle with the high dimensionality of the data, or when features are highly correlated.

While current era quantum computers are not large enough to tackle a dataset with 20,000 features, quantum hardware has been continuing to improve and quantum algorithms could soon offer a highly scalable solution to feature selection using high dimensional data. Initial implementations of quantum feature selection algorithms have shown promising results on small scale quantum devices including quantum annealers and gate based quantum computers.

Objectives

  1. Encode a relevant/example dataset into the hamiltonian
  2. Run this through the algorithm provided to obtain a feature set
  3. Analyse the relevance of the feature set
  4. Analyse the complexity of gates and circuit structure

Deliverables

  • Reusable templates and workflows for quantum feature selection using Classiq
  • Jupyter Notebook containing:
    • Quantum program for QUBO feature selection
    • Analysis of gate counts and resource usage
    • Visualisation of the quality of the feature set

Proposed approach

  • Literature review:
    • Review the paper to be implemented: “Feature selection on quantum computers” by Mücke, Sascha et al.
    • Conduct additional literature review as required
  • Baseline model:
    • Identify suitable datasets for implementing the Feature Selection problem
    • Decide on the best metrics to evaluate the success of the Quantum Feature Selection algorithm
    • Set up a baseline classical solution for comparing the Quantum Feature Selection solution against
  • Algorithm Coding:
    • Formulate the feature selection problem for a given dataset as a Hamiltonian as described in the Project Overview section
    • Utilize Classiq’s high level functional modeling to represent the Hamiltonian and synthesize optimized circuits
    • Utilize Classiq’s library to minimize the number of qubits by applying efficient qubit mapping strategies and reduce gate depth by using efficient compilation techniques
    • Run the workflow including the quantum set up and classical optimization on a loop until we find the optimal feature subset
  • Evaluation and analysis :
    • Evaluate the results of the Quantum Feature Selection on the chosen metrics. Compare and contrast with classical solutions
  • Generate .qmod File:
    • Generate the .qmod file to save the models.
    • Confirm successful notebook execution and .qmod file generation.
  • Quality Check:
    • Test the algorithm and workflow with a new dataset and ensure it can be generalized for any dataset as long as it meets the API
    • Confirm accuracy, code correctness and clear descriptions
  • Submit Contribution:
    • Open a Pull Request in classiq-library on GitHub
    • Include a summary of insights and results.

Resources

lakshmi-yendapalli avatar Feb 27 '25 19:02 lakshmi-yendapalli

Sounds like an interesting contribution to our repo @lakshmi-yendapalli!

Please note that we accept high-quality implementations to our repository and will be glad to accept a contribution that meets our standards. Feel free to reach out to the community for any questions.

Good luck!

TomerGoldfriend avatar Mar 02 '25 07:03 TomerGoldfriend

Sounds like an interesting contribution to our repo @lakshmi-yendapalli!

Please note that we accept high-quality implementations to our repository and will be glad to accept a contribution that meets our standards. Feel free to reach out to the community for any questions.

Good luck!

Thank you for approving us, looking forward to it!

palexbg avatar Mar 02 '25 08:03 palexbg

Thanks @TomerGoldfriend our team is excited to get it going!

Aukau avatar Mar 02 '25 16:03 Aukau

Hi @lakshmi-yendapalli @Aukau @palexbg, what is the status of this? Are you still working on the implementation?

TaliCohn avatar Apr 01 '25 10:04 TaliCohn

@TaliCohn Thank you for reaching out! Unfortunately, we have had some personal/work issues come up and we have not been able to finish work on this. We are still interested in completing this - is it possible to give us an extension?

lakshmi-yendapalli avatar Apr 03 '25 21:04 lakshmi-yendapalli

Sure! How much time do you thing you will need?

TaliCohn avatar Apr 06 '25 07:04 TaliCohn

@TaliCohn , thank you!

I think very likely until mid of May/end of May. We are all just totally slammed with coursework/work/separate research projects right now - semester ends soon and we'll have a break during which we can push this through.

palexbg avatar Apr 08 '25 16:04 palexbg