INET-RC/GeoMX: GeoMX: A fast and unified system for distributed m...

Introduction

GeoMX is a MXNet-based two-layer parameter server framework, aiming at integrating data knowledge that owned by multiple independent parties in a privacy-preserving way (i.e. no need to transfer raw data), by training a shared deep learning model collaboratively in a decentralized and distributed manner.

Unlike other distributed deep learning software framworks and the emerging Federated Learning technologies which are based on single-layer parameter server architecture, GeoMX applies two-layer architecture to reduce communication cost between parties.

GeoMX allows parties to train the deep learning model on their own data and clusters in a distributed mannar locally. Parties only need to upload the locally aggregated model gradients (or model updates) to the central party to perform global aggregation, model updating and model synchronization.

To mitigate the communication bottleneck between the central party and participating parties, GeoMX implements multiple communication-efficient strategies, such as BSC, DGT, TSEngine and P3, boosting the model training.

BSC, fully named as Bilateral Sparse Compression, reduces the size of gradient tensors during Local Server's push and pull progress. DGT, a contribution-aware differential gradient transmission protocol, fully named as Differential Gradient Transmission , transfers gradients in multi-channels with different reliability and priority according to their contribution to model convergence. TSEngine, an adaptive communication scheduler for efficient communication overlay of the parameter server system in DML-WANs, dynamically schedules the communication logic over the parameter server and workers based on the active network perception. P3 overlaps parameter synchronization with computation in order to improve the training performance.

Furthermore, GeoMX supports:

4 communication algorithms, including fully-synchronous algorithm, mix-synchronous algorithm, HierFAVG-styled synchronous algorithm and DC-ASGD asynchronous algorithm.
(6 categories) 25 machine learning algorithms, including 9 gradient descent algorithms, 7 ensemble learning algorithms, 3 support vector algorithms, 2 MapReduce algorithms, 2 online learning algorithms, 2 incremental learning algorithms.

Installation

Build from source

clone the GeoMX project

git clone https://github.com/INET-RC/GeoMX.git
cd GeoMX

Install a Math Library and OpenCV

# e.g. OpenBLAS
sudo apt-get install -y libopenblas-dev
sudo apt-get install -y libopencv-dev

Build core shared library

There is a configuration file for make, make/config.mk compilation options.

If building on CPU and using OpenBLAS:

USE_OPENCV=1
USE_BLAS=openblas

If building on GPU and you want OpenCV and OpenBLAS:

USE_OPENCV=1
USE_BLAS=openblas
USE_CUDA=1
USE_CUDA_PATH=/usr/local/cuda

Visit usage-example for other compilation options. You can edit it and then run make -j$(nproc)

Building from source creates a library called libmxnet.so in the lib folder in your project root.

You may also want to add the shared library to your LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=$PWD/lib

Install Python bindings

Navigate to the root of the GeoMX folder then run the following:

$ cd python
$ pip install -e .

Note that the -e flag is optional. It is equivalent to --editable and means that if you edit the source files, these changes will be reflected in the package installed.

Docker

It's highly recommended to run GeoMX in a Docker container. To build such a Docker image, use the provided Dockerfile.

docker build -t GeoMX -f Dockerfile .

GPU Backend

Please make sure NVIDIA Container Toolkit is installed. Here are some install for Ubuntu users:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

restart the Docker daemon to complete the installation:

sudo systemctl restart docker

Then edit daemon.json to set the default runtime:

$ vim /etc/docker/daemon.json
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Finally, restart the Docker daemon:

sudo systemctl restart docker

Documentation

Deployment: How to run GeoMX on multiple machines with Docker.
Configurations: How to use GeoMX's communication-efficient strategies.
System Design: How and why GeoMX and its modifications make better.
Examples: Understandable demos of GeoMX's communication-efficient strategies, communication algorithms and machine learning algorithms.

Communication

Github Issues is welcomed.

Contributors and Institutions

Contributors:

Intelligent Network and Application Research Center¹
DataMiningLab¹

Institutions:

University of Electronic Science and Technology of China

Copyright and License

GeoMX is provided under the Apache-2.0 license.

GeoMX
GeoMX copied to clipboard

Metadata

Introduction

Installation

Build from source

clone the GeoMX project

Install a Math Library and OpenCV

Build core shared library

Install Python bindings

Docker

GPU Backend

Documentation

Communication

Contributors and Institutions

Copyright and License

← Metadata

Owner

Metadata

GeoMX GeoMX copied to clipboard

Metadata

Introduction

Installation

Build from source

clone the GeoMX project

Install a Math Library and OpenCV

Build core shared library

Install Python bindings

Docker

GPU Backend

Documentation

Communication

Contributors and Institutions

Copyright and License

← Metadata

Owner

Metadata

GeoMX
GeoMX copied to clipboard