VFL playground

This repo contains a trial of vertical federated learning (VFL) where the data holders do not train neural networks on their own devices.

Why?

The purpose of this is to demonstrate that VFL is a useful paradigm even for "simple" problems for which neural networks are not required. In this demo, each data holder owns some part of the "Titanic" dataset, which is simple enough to achieve high accuracies even with O(100) datapoints. Each data holder trains a logistic regression model on their part of the dataset. They send their predictions to a centralised computational server, which trains a neural network on the concatenation of the outputs from each data holder in order to better predict labels for the datapoints. The idea behind this process is that data holders will perform differently relative to one another based on the specific characteristics of their own data. Mapping these outputs to the more correct function of the data is a non-linear process (hence why we need the neural networks!)

Get started

Python

This demo has been coded using python 3.8, but similar minor versions will work.

Environment

Very simple - only a few packages required (and no GPUs!). Run pip install -r requirements.txt to install necessary packages.

How to run

Run main.sh. This trains a model in a centralised setting and then a model in the VFL setting.

Alternatively, execute python scripts/run_(de)centralised.py, where (de) is optional, to run one of the two scripts on its own.

Security implications

Incoming

License

Apache 2.0. See the license for more information.

simple-vfl
simple-vfl copied to clipboard

Metadata

VFL playground

Why?

Get started

Python

Environment

How to run

Security implications

License

← Metadata

Owner

Metadata

simple-vfl simple-vfl copied to clipboard

Metadata

VFL playground

Why?

Get started

Python

Environment

How to run

Security implications

License

← Metadata

Owner

Metadata

simple-vfl
simple-vfl copied to clipboard