deep-learning-containers icon indicating copy to clipboard operation
deep-learning-containers copied to clipboard

[bug] Horovod installations in all framework Dockerfiles are not framework-specific

Open ChaiBapchya opened this issue 5 years ago • 0 comments

Checklist

  • [x] I've prepended issue tag with type of change: [bug]
  • [ ] (If applicable) I've attached the script to reproduce the bug
  • [ ] (If applicable) I've documented below the DLC image/dockerfile this relates to
  • [ ] (If applicable) I've documented below the tests I've run on the DLC image
  • [ ] I'm using an existing DLC image listed here: https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
  • [ ] I've built my own container based off DLC (and I've attached the code used to build my own image)

Concise Description: Question: Why do we not use

HOROVOD_WITH_PYTORCH=1 pip install horovod[pytorch]

instead of https://github.com/aws/deep-learning-containers/blob/03537478a2641b040114382d1b3e528840b675f7/pytorch/training/docker/1.6.0/py3/cu101/Dockerfile.gpu#L146

or

HOROVOD_WITH_MXNET=1 pip install horovod[mxnet]

instead of https://github.com/aws/deep-learning-containers/blob/03537478a2641b040114382d1b3e528840b675f7/mxnet/training/docker/1.7.0/py3/cu101/Dockerfile.gpu#L193

or

HOROVOD_WITH_TENSORFLOW=1 pip install horovod[tensorflow]

instead of https://github.com/aws/deep-learning-containers/blob/03537478a2641b040114382d1b3e528840b675f7/tensorflow/training/docker/2.3.1/py3/cu102/Dockerfile.gpu#L210 because acc to https://horovod.readthedocs.io/en/stable/install_include.html that [] is required to force particular framework right? DLC image/dockerfile:

Current behavior:

Expected behavior:

Additional context:

ChaiBapchya avatar Oct 08 '20 05:10 ChaiBapchya