deep-learning-containers
deep-learning-containers copied to clipboard
[bug]Tensorflow Framework Container not working on Apple M1
Checklist
- [X] I've prepended issue tag with type of change: [bug]
- [X] (If applicable) I've attached the script to reproduce the bug
- [X] (If applicable) I've documented below the DLC image/dockerfile this relates to
- [X] (If applicable) I've documented below the tests I've run on the DLC image
- [X] I'm using an existing DLC image listed here: https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
- [X] I've built my own container based off DLC (and I've attached the code used to build my own image)
Concise Description:
Tensorflow Framework Container are not currently working on Apple M1 chip. I've tried to execute a SageMaker Training Job in local mode, by using the container image 63104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:2.9-cpu-py39 that it's reporting the following error during the execution of the command docker-compose -f /private/var/folders/1d/p7dclqcx4934dybvv117p3640000gr/T/tmpqygkvtra/docker-compose.yaml up --build --abort-on-container-exit :
0vocrndurd-algo-1-8yeyr | The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine.
0vocrndurd-algo-1-8yeyr | qemu: uncaught target signal 6 (Aborted) - core dumped
Seems that the Tensorflow Framework Container it's not compatible with Apple M1 chips, so it cannot be used in local mode on these types of architecture
DLC image/dockerfile:
63104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:2.9-cpu-py39
Current behavior:
0vocrndurd-algo-1-8yeyr | The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine.
0vocrndurd-algo-1-8yeyr | qemu: uncaught target signal 6 (Aborted) - core dumped
Expected behavior:
Container is executed correctly by using the Local Mode command
docker-compose -f /private/var/folders/1d/p7dclqcx4934dybvv117p3640000gr/T/tmpqygkvtra/docker-compose.yaml up --build --abort-on-container-exit
Additional context:
The issue is related to Apple M1 chips. You can replicate the issue by using the local mode capability of Amazon SageMaker. The issue is related to the different tensorflow modules for arm64 and amd64 architectures.
Should this issue be closed now due to https://github.com/aws/deep-learning-containers/blob/437044b4a0ddd69e81f4945fda4a8ce22ff80ae6/tensorflow/inference/docker/2.9/py3/Dockerfile.graviton.cpu#L28
Looks like AWS team published arm compatible tensorflow_model_server.
We do support arm (Graviton) docker images, image uri can be found in the available_images.md file.
Let me know if you have any additional questions.
Closing the issue. Feel free to re-open if you have additional questions.