deep-learning-containers icon indicating copy to clipboard operation
deep-learning-containers copied to clipboard

[bug]Tensorflow Framework Container not working on Apple M1

Open brunopistone opened this issue 3 years ago • 1 comments
trafficstars

Checklist

  • [X] I've prepended issue tag with type of change: [bug]
  • [X] (If applicable) I've attached the script to reproduce the bug
  • [X] (If applicable) I've documented below the DLC image/dockerfile this relates to
  • [X] (If applicable) I've documented below the tests I've run on the DLC image
  • [X] I'm using an existing DLC image listed here: https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
  • [X] I've built my own container based off DLC (and I've attached the code used to build my own image)

Concise Description:

Tensorflow Framework Container are not currently working on Apple M1 chip. I've tried to execute a SageMaker Training Job in local mode, by using the container image 63104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:2.9-cpu-py39 that it's reporting the following error during the execution of the command docker-compose -f /private/var/folders/1d/p7dclqcx4934dybvv117p3640000gr/T/tmpqygkvtra/docker-compose.yaml up --build --abort-on-container-exit :

0vocrndurd-algo-1-8yeyr  | The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine.
0vocrndurd-algo-1-8yeyr  | qemu: uncaught target signal 6 (Aborted) - core dumped

Seems that the Tensorflow Framework Container it's not compatible with Apple M1 chips, so it cannot be used in local mode on these types of architecture

DLC image/dockerfile:

63104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:2.9-cpu-py39

Current behavior:

0vocrndurd-algo-1-8yeyr  | The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine.
0vocrndurd-algo-1-8yeyr  | qemu: uncaught target signal 6 (Aborted) - core dumped

Expected behavior:

Container is executed correctly by using the Local Mode command

docker-compose -f /private/var/folders/1d/p7dclqcx4934dybvv117p3640000gr/T/tmpqygkvtra/docker-compose.yaml up --build --abort-on-container-exit

Additional context:

The issue is related to Apple M1 chips. You can replicate the issue by using the local mode capability of Amazon SageMaker. The issue is related to the different tensorflow modules for arm64 and amd64 architectures.

brunopistone avatar Aug 02 '22 16:08 brunopistone

Should this issue be closed now due to https://github.com/aws/deep-learning-containers/blob/437044b4a0ddd69e81f4945fda4a8ce22ff80ae6/tensorflow/inference/docker/2.9/py3/Dockerfile.graviton.cpu#L28

Looks like AWS team published arm compatible tensorflow_model_server.

gitrc avatar Oct 10 '22 17:10 gitrc

We do support arm (Graviton) docker images, image uri can be found in the available_images.md file.

Let me know if you have any additional questions.

tejaschumbalkar avatar Mar 21 '23 23:03 tejaschumbalkar

Closing the issue. Feel free to re-open if you have additional questions.

tejaschumbalkar avatar Mar 30 '23 20:03 tejaschumbalkar