serving icon indicating copy to clipboard operation
serving copied to clipboard

tensorflow-serving docker container doesn't work on Macs with Apple M1 chips.

Open kuba-lilz opened this issue 3 years ago • 18 comments

Bug Report

tensorflow-serving docker container doesn't work on Macs with Apple M1 chips.

Do maintainers of tensorflow-serving intend to solve this? Or do they see this as a problem somewhere upstream (docker for mac? OSX?) that should be fixed there? If so, does someone have a clear understanding as to where in the stack lies the issue?

My team is using tensorflow-serving on linux in production, but many members develop on OSX, so having a running docker container version of tensorflow serving in development is crucial to us.

Now that no new Macbook laptops with Intel CPUs are offered, I imagine a lot of other development teams that use tensorflow-serving are in similar situation, or will be as soon as they will start to replace their computers, so I think this bug will grow to be a serious problem for tensorflow-serving adoption and continuous use.

System information

  • OS Platform and Distribution: macOS Monterey (12.0.1)
  • TensorFlow Serving installed from (source or binary): from docker hub
  • TensorFlow Serving version: tensorflow/serving:2.6.2
  • Chip: Apple M1
  • Docker for desktop: 4.3.0
  • Docker engine: v20.10.11

Describe the problem

tensorflow-serving docker container doesn't work on Macs with Apple M1 chips. Container crashes when run.

Exact Steps to Reproduce

Run official script on Apple with M1 chip. In script below we are using tensorflow/serving:2.6.2 instead of tensorflow/serving, so it's easier to do version control (at the time of this writing container with latest tag gives the same output though)

git clone https://github.com/tensorflow/serving

# Location of demo models
TESTDATA="$(pwd)/serving/tensorflow_serving/servables/tensorflow/testdata"

docker run -t --rm -p 8501:8501 --platform linux/amd64 -v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" -e MODEL_NAME=half_plus_two tensorflow/serving:2.6.2 &

Last line results in:

[1] 1032
[libprotobuf FATAL external/com_google_protobuf/src/google/protobuf/generated_message_reflection.cc:2345] CHECK failed: file != nullptr:                                                        [~/workspace]
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: file != nullptr:
qemu: uncaught target signal 6 (Aborted) - core dumped
/usr/bin/tf_serving_entrypoint.sh: line 3:     9 Aborted                 tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

[1]  + exit 134   docker run -t --rm -p 8501:8501 --platform linux/amd64 -v  -e

Same happens when running docker container with --platform linux/amd64 option.

On a sidenote - I found a few related issues here and there, but none of them address tensorflow serving x docker container x m1 chip problem directly, hence I posted a new issue. Here are some of them, including notes on why are they relevant:

kuba-lilz avatar Dec 09 '21 03:12 kuba-lilz

@kuba-lilz,

As you have mentioned that already a similar issue is still open, Can you close this one and follow up with this issue to track the progress in a single place? Thanks!

sanatmpa1 avatar Dec 09 '21 17:12 sanatmpa1

@sanatmpa1 Thank you for replying It's not clear to me why would you bundle this issue with #1816 which doesn't state anything about docker. I listed it because it's similar in spirit, but I do think they are two separate issues, and are likely to require different solutions. To me it looks that #1816 implies running tensorflow-serving on OSX on M1 chips, while this issue is specifically about running it on docker on OSX on M1 chips

It could very well be that #1816 might be needed because library a used by tensorflow-serving is not working on OSX on M1 chips, while this issue is due to library b used by docker not working on OSX on M1 chips or library C used by tensorflow-serving on linux not working inside docker on M1 chips. Do you have a good reason to say both issues would be solved by the same solution?

kuba-lilz avatar Dec 14 '21 00:12 kuba-lilz

@kuba-lilz,

As per this comment, it was mentioned for docker builld as well and so thought we can track it in same issue. Thank you for the clarification and we can track this as a separate issue.

sanatmpa1 avatar Dec 16 '21 17:12 sanatmpa1

This is the same issue as https://github.com/tensorflow/tensorflow/issues/52845. The issue applies to AMD64 Docker images running on ARM64 hosts. The underlying emulation issue in QEMU has been resolved.

In order to close this issue, we need one of two things:

  • Google or one of its ARM partners should add a prebuilt ARM64 TensorFlow Serving image to Docker Hub. ARM doesn’t want to do it, so that leaves either Google, Intel, AWS, or Linaro.
  • Docker needs to update its QEMU version: https://github.com/docker/for-mac/issues/6620.

fumoboy007 avatar Dec 06 '22 10:12 fumoboy007

What was the outcome (if any), on this issue? 2023 and the situation appears to be the same.

glynjackson avatar May 03 '23 20:05 glynjackson

Yes, M1 processors (still exits) - according to AMD64, ARM64. Screenshot 2023-05-21 at 21 32 03

PiotrZak avatar May 21 '23 19:05 PiotrZak

Ping! Can someone from the TensorFlow team please look into adding a prebuilt ARM64 TensorFlow Serving image to Docker Hub?

fumoboy007 avatar Aug 02 '23 05:08 fumoboy007

Doesn't this simply need a CI/CD job and nothing else at this point?

kokroo avatar Aug 17 '23 03:08 kokroo

Any Updates On this? it's been 2 years already

sdchc66 avatar Oct 07 '23 17:10 sdchc66

Typical response from Google: nothing.

AWS is vastly superior to anything Google does.

stevehs17 avatar Nov 01 '23 02:11 stevehs17

Any updates on this?

gcuder avatar Nov 27 '23 09:11 gcuder

Docker Desktop for Mac 4.27.0 with updated QEMU was recently released and this seems to work now - check the related issue: https://github.com/docker/for-mac/issues/6620

matemijolovic avatar Jan 29 '24 08:01 matemijolovic

Docker Desktop for Mac 4.27.0 with updated QEMU was recently released and this seems to work now - check the related issue: docker/for-mac#6620

How did you manage to run it? I was testing it on my M2 Mac using Docker for Mac 4.27.0 and I still get the same error:

I run:

docker run -t --rm -p 8501:8501 \
    -v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" \
    -e MODEL_NAME=half_plus_two --platform="linux/amd64" \
    tensorflow/serving

and get

/usr/bin/tf_serving_entrypoint.sh: line 3:    12 Illegal instruction     tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

My docker environment by docker version:

Client:
 Cloud integration: v1.0.35+desktop.10
 Version:           25.0.1
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        29cf629
 Built:             Tue Jan 23 23:06:12 2024
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.27.0 (135262)
 Engine:
  Version:          25.0.1
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       71fa3ab
  Built:            Tue Jan 23 23:09:35 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.27
  GitCommit:        a1496014c916f9e62104b33d1bb5bd03b0858e59
 runc:
  Version:          1.1.11
  GitCommit:        v1.1.11-0-g4bccb38
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

gcuder avatar Jan 29 '24 10:01 gcuder

@gcuder can you check if the feature flag Use Rosetta for x86/amd64 emulation on Apple Silicon in Docker Desktop settings is disabled? It might be that you are using Rosetta which still doesn't support emulating AVX instructions.

matemijolovic avatar Jan 29 '24 11:01 matemijolovic

@gcuder can you check if the feature flag Use Rosetta for x86/amd64 emulation on Apple Silicon in Docker Desktop settings is disabled? It might be that you are using Rosetta which still doesn't support emulating AVX instructions.

That was the problem. Without Rosetta it works like a charm. Finally, it has been a long time coming.

gcuder avatar Jan 29 '24 12:01 gcuder

I was able to make it run even with Rosetta by using the Bitnami-Image which comes with linux/amd64 and linux/arm64 support.

gcuder avatar Feb 07 '24 13:02 gcuder

I was able to make it run even with Rosetta by using the Bitnami-Image which comes with linux/amd64 and linux/arm64 support.

The issue here is that we need a native ARM docker image.

kokroo avatar Feb 07 '24 13:02 kokroo

I was able to make it run even with Rosetta by using the Bitnami-Image which comes with linux/amd64 and linux/arm64 support.

The issue here is that we need a native ARM docker image.

Well, doesn't count a linux/arm64 image as native? Can you explain what you mean by "native"?

image

gcuder avatar Feb 07 '24 14:02 gcuder