openfl
openfl copied to clipboard
Run collaborator in docker
nvidia-container-runtime should be installed https://docs.docker.com/config/containers/resource_constraints/#gpu
- Add gpgkey for
nvidia-container-runtime
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
- Install
nvidia-container-runtime
sudo apt-get install nvidia-container-runtime
- Ensure the
nvidia-container-runtime-hookis accessible from$PATH.
which nvidia-container-runtime-hook
- Restart the Docker daemon
sudo service docker restart
Docker proxy:
In order to use docker with proxy it can be defined in director_config.yaml and envoy_config.yaml
#director_config.yaml
settings:
listen_host: localhost
listen_port: 50050
sample_shape: [ '300', '400', '3' ]
target_shape: [ '300', '400' ]
envoy_health_check_period: 5 # in seconds
docker:
env:
http_proxy:
https_proxy:
no_proxy:
buildargs:
HTTP_PROXY:
HTTPS_PROXY:
NO_PROXY:
#envoy_config.yaml
params:
cuda_devices: [ 0, 2 ]
docker:
env:
http_proxy:
https_rpoxy:
no_proxy:
buildargs:
HTTP_PROXY:
HTTPS_PROXY:
NO_PROXY:
optional_plugin_components:
cuda_device_monitor:
template: openfl.plugins.processing_units_monitor.pynvml_monitor.PynvmlCUDADeviceMonitor
settings: [ ]
shard_descriptor:
template: kvasir_shard_descriptor.KvasirShardDescriptor
params:
data_folder: kvasir_data
rank_worldsize: 1,10
enforce_image_hw: '300,400'
Manage Docker as a non-root user: https://docs.docker.com/engine/install/linux-postinstall/
@dmitryagapov @alexey-gruzdev Can a tag be added to PR's like this to reflect that the feature is experimental / needs pending design review before merge? WIP is used for PR's that aren't ready for review yet, but it seems like this belongs in a different category