Privacy-Preserving Graph-Based Machine Learning for Collaborative Anti-Money Laundering using Concrete ML
Zama Grant Program: Application
- Library targeted: Concrete ML
Overview
This project is part of my Undergraduate Computer Science Final Year Project - "Privacy-Preserving Graph-Based Machine Learning with Fully Homomorphic Encryption for Collaborative Anti-Money Laundering", and will be extended for submission to a conference for publication.
With the increasing digitalization of financial transactions and the rise of cybercrime, combating money laundering has become increasingly complex. Graph-based machine learning techniques have emerged as promising tools for Anti-Money Laundering (AML) detection, capable of capturing intricate relationships within money laundering networks. However, the effectiveness of AML solutions is hindered by the challenge of data silos within financial institutions, limiting collaboration and reducing overall efficacy.
To address these challenges, this research presents a novel privacy-preserving approach for collaborative AML machine learning, facilitating secure data sharing across institutions and borders while preserving data privacy and regulatory compliance. Leveraging Fully Homomorphic Encryption (FHE), computations can be performed on encrypted data without decryption, ensuring sensitive financial data remains confidential.
The research delves into the integration of Fully Homomorphic Encryption over the Torus (TFHE) using Concrete ML with graph-based machine learning techniques, which are divided into 2 pipelines.
- Privacy-Preserving Graph Neural Network (GNN) pipeline
- Privacy-Preserving Graph-based XGBoost pipline using Graph Feature Preprocessor
Description
Milestones
1. Development of privacy-preserving custom Graph Neural Network pipeline
- Data preparation and preprocessing
- Environment configuration to ensure compatibility between PyTorch Geometric (PyG) and Concrete ML
- Custom quantisation of GNN layers, activation functions, node features and edge features using Brevitas
- Pruning of the GNN (if necessary) to be compatible with FHE bit-width constraints
- Conversion of GNN model to FHE equivalent
- Enhancing existing ONNX node implementation (eg. refining ONNX operation implementation in Concrete ML such as Expand, Unsqueeze, ConstantOfShape and Reshape)
- Integration of new ScatterElements ONNX operator in Concrete ML (or develop an alternative workaround)
- Debugging of other conversion errors (particularly challenging given the novel integration of PyG with Concrete ML)
- Training and evaluation of compiled GNN model using Concrete ML
- Conduct experiments varying GNN quantisation parameters
- Evaluation of the privacy-preserving models' performance and inference time using FHE
- Evaluation of the floating-point equivalent (non-FHE) models' performance and inference time for comparison
- Discuss the trade-off between models' performance and inference time / time ratio
2. Development of privacy-preserving XGBoost pipeline using Graph Feature Preprocessor
- Data preparation and preprocessing
- Training and evaluation of XGBoost with Graph Feature Preprocessor using Concrete ML
- Conduct experiments with incrementally GFP-enriched graph features using XGBoost
- Evaluation of the privacy-preserving models' performance and inference time using FHE
- Evaluation of the floating-point equivalent (non-FHE) models' performance and inference time for comparison
- Discuss the trade-off between models' performance and inference time / time ratio
- Conduct experiments varying XGBoost hyperparameters such as n_estimators, max_depth and n_bits
- Evaluation of the privacy-preserving models' performance and inference time using FHE
- Evaluation of the floating-point equivalent (non-FHE) models' performance and inference time for comparison
- Discuss the trade-off between models' performance and inference time / time ratio
If the above milestones are achieved, exploring additional development for tutorials or blog posts related to the subject matter can also be considered.
Estimated reward: €10k-20k
Related links and reference:
- [Project Repository (In-Progress)] https://github.com/fabecode/GraphML-FHE
- [Final Year Project Thesis] https://dr.ntu.edu.sg/handle/10356/175347
Hello fabecode,
Thank you for your Grant application! Our team will review and add comments in your issue! In the meantime:
- Join the FHE.org discord server for any questions (pick the Zama library channel you will use).
- Ask questions privately: [email protected].
hey @fabecode,
Thank you very much for your interest in what we do at Zama, and your proposition for a grant. For now, we will not follow up with your proposition. But we invite you to keep an eye on this repository as we will launch new bounties soon, if you're interested in playing with Zama libs.
Cheers, JZ