autodist icon indicating copy to clipboard operation
autodist copied to clipboard

Simple Distributed Deep Learning on TensorFlow

Results 22 autodist issues
Sort by recently updated
recently updated
newest added

Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 3.11.0 to 3.15.0. Release notes Sourced from protobuf's releases. Protocol Buffers v3.15.0 Protocol Compiler Optional fields for proto3 are enabled by default, and no longer require the...

dependencies

This PR adds RaySGD API to Autodist which enables it to train models on a Ray cluster. The API defines a `TFTrainer` class which takes a model creator, data creator,...

**Please describe the bug** `example/linear_regression.py` with AllReduce strategy crashes when run on a CPU-only multinode cluster with the resource spec like: ``` nodes: - address: X.X.X.X cpus: [0] chief: true...

This is a draft pull request to show how Ray works with autodist. Not ready to merge yet.

**System information** - AutoDist version: master - Are you willing to contribute it (Yes/No): Yes **Describe the new feature and the current behavior/state** Currently AutoDist uses SSH to coordinate distributed...

Adding the experimental AutoStrategy that determines the best strategy to use, given a model and resource spec, automatically.

- Added a cpu_only_devices feature to resource_spec - Added an initial version of the BytePS strategy