serving icon indicating copy to clipboard operation
serving copied to clipboard

Cannot load saved model from S3 bucket after tensorflow-serving 2.7

Open jeongukjae opened this issue 3 years ago • 21 comments

Bug Report

If this is a bug report, please fill out the following form in full:

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • TensorFlow Serving installed from (source or binary): Docker image (tensorflow/serving:2.7.0)
  • TensorFlow Serving version: 2.7.0

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow Serving.

SavedModel cannot be loaded from S3 bucket after tensorflow-serving 2.7. Server raised error like below:

2022-01-18 02:30:30.032712: E tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:365] FileSystemStoragePathSource encountered a filesystem access error: Could not find base path s3://some-s3-path..... for servable model with error UNIMPLEMENTED: File system scheme 's3' not implemented (file: 's3://some-s3-path.....')

Docker image tensorflow/serving:2.6.2 runs without any error in same configuration.

Exact Steps to Reproduce

Please include all steps necessary for someone to reproduce this issue on their own machine. If not, skip this section.

docker run --rm -it \
  --env AWS_REGION=some-aws-region \
  --env MODEL_BASE_PATH=some-s3-path \
  tensorflow/serving:2.7.0

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

2022-01-18 02:30:30.032712: E tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:365] FileSystemStoragePathSource encountered a filesystem access error: Could not find base path s3://some-s3-path..... for servable model with error UNIMPLEMENTED: File system scheme 's3' not implemented (file: 's3://some-s3-path.....')

I think this is because of Modular File System Migration in TF 2.7.0's release note. Is there any way to link tensorflow-io in the build step?

  • https://github.com/tensorflow/io/blob/16d6f43e93bd2738b3011f73109708ecf318f195/tensorflow_io/core/filesystems/filesystem_plugins.cc

jeongukjae avatar Jan 18 '22 05:01 jeongukjae

I asked a question on SO just a few days before this issue was opened: https://stackoverflow.com/questions/70700005/using-s3-bucket-for-savedmodel-with-tensorflow-serving2-7-0-gpu-docker-image. TF support asked me to follow up here, which is what I'm doing.

siddharth-agrawal avatar Jan 25 '22 12:01 siddharth-agrawal

Is there any progress here?

jeongukjae avatar Feb 22 '22 06:02 jeongukjae

Cloud filesystem implementation moved to https://github.com/tensorflow/io.

Users will have to install https://pypi.org/project/tensorflow-io/ and import it. This has the side effect of loading the plugin for the S3 filesystem so there will be an implementation

So, from 2.7 onwards, users need pip install tensorflow-io and import tensorflow_io and then code will work as from before.

For any 2.6 patch or previous releases there is nothing to change.

mihaimaruseac avatar Mar 01 '22 22:03 mihaimaruseac

@mihaimaruseac Thanks for your response :)

But I think that makes sense in python environment. To fix this issue, maybe tensorflow-io should be linked in bazel or c++ code level. For now, I downgrade tensorflow serving docker image version to load saved model from s3.

jeongukjae avatar Mar 07 '22 09:03 jeongukjae

@mihaimaruseac are you referring to importing tfio in the training script? how will tf serving be impacted from tfio that way?

richieyoum avatar Apr 21 '22 17:04 richieyoum

any updates? @yongtang You removed s3 support in tensorflow in this PR https://github.com/tensorflow/tensorflow/pull/51032/files Do you know how to build s3 support into tensorflow serving? tensorflow serving compiles against tensorflow source code. If tensorflow source code doesn't support s3 anymore, tfserving will fail to load models from s3 too

haitong avatar May 08 '22 00:05 haitong

My team wanted to upgrade some of our production systems to tensorflow 2.8.X, but this is a blocking issue. @shan3290 any update on a way to have s3 support with tensorflow serving?

hsahovic avatar Jun 21 '22 16:06 hsahovic

Is there any news ? we would like to use tensorflow 2.8 too with s3. thanks

lumenghe avatar Jul 01 '22 15:07 lumenghe

Any updates on this issue?

RakeshRaj97 avatar Jul 03 '22 10:07 RakeshRaj97

image docker run --rm -p 8501:8501 --name tfs-s3 -e AWS_ACCESS_KEY_ID=minioadmin -e AWS_SECRET_ACCESS_KEY=minioadmin -e S3_ENDPOINT=http://127.0.0.1:9000 -e AWS_REGION=us-east-1 --env MODEL_BASE_PATH=S3://models --env MODEL_NAME=half -t tensorflow/serving:latest my error is: image

The latest version is 2.5.1, right? Why is it also a tfio error? What can I do to use S3?

521bibi avatar Aug 05 '22 02:08 521bibi

Hi, the 'latest' tag points to 2.5.1. But it's 1 year ago and is not the newest one. We recently tag by explicit versions. According to the comments above, could you try 2.6.x (e.g. tensorflow/serving:2.6.2)?

For this issue, TF-serving is adding the dependency to TF IO to bring this feature back. Sorry for the breakage and we will let you know once it's fixed.

shan3290 avatar Aug 08 '22 22:08 shan3290

This is also a blocking issue for our team, would be great if this can be fixed soon!

fsonntag avatar Aug 23 '22 08:08 fsonntag

This still seems to be an issue. Any idea when the removed support for S3 will be added back in instead of workarounds?

glynjackson avatar Oct 19 '22 10:10 glynjackson

It won't be added back. TF team at Google does not maintain S3 filesystem, so the code in TF would just rot. Whereas it being in SIG IO helps by making up to date and adding new features.

mihaimaruseac avatar Oct 19 '22 15:10 mihaimaruseac

@mihaimaruseac Can you please add any tutorial on how to use tf io inside tf serving ecosystem? Currently I am doing a manual work of copying the models inside the tf-serving containers and doing a docker commit. This is extremely inefficient and I need to maintain heavy fat dockers, lot of time required to push and pull to my internal artifactory from my local dev machine. It would be great if I could automate this process by calling the models straight from the S3/Minio buckets. I hope many of us are facing this problem and I would like to hear solutions to make this process lot more efficient.

RakeshRaj97 avatar Oct 20 '22 00:10 RakeshRaj97

@mihaimaruseac That makes sense to not maintain the code, but it could be added as a dependency, such as TF Decision Forests.

I also tried compiling TF Serving with TF IO as a dependency, but as someone with no Bazel or TF development experience, it's not easy, I couldn't make it work.

fsonntag avatar Oct 20 '22 07:10 fsonntag

That makes total sense @mihaimaruseac thank you, but unless I'm misunderstanding this, why isn’t tensorflow-io a package or dependency inside the container just like any other?

glynjackson avatar Oct 20 '22 08:10 glynjackson

For all these questions, adding @yongtang

The system is similar to the GCS filesystem support, except that one is also distributed as a separate pip package.

mihaimaruseac avatar Oct 20 '22 14:10 mihaimaruseac

@yongtang I am also interested in the response to the following question and potential future plans for including tensorflow-io as a dependency in the serving image. Do you have any comments / update?

That makes total sense @mihaimaruseac thank you, but unless I'm misunderstanding this, why isn’t tensorflow-io a package or dependency inside the container just like any other?

TaylorZowtuk avatar Nov 24 '22 22:11 TaylorZowtuk

Hi everyone. I tried to build TensorFlow Serving with an S3 filesystem implemented in TensorFlow IO, and it was successful.

Here are the codes to build them and docker images.

Codes: https://github.com/jeongukjae/tf-serving-s3 Docker image: https://github.com/jeongukjae/tf-serving-s3/pkgs/container/tf-serving-s3

Docker image size is 465MB, which is a little bit bigger than the official image. (tensorflow/serving:2.11.0 is 459MB)

Maybe the optimal solution is that the TensorFlow team or community (SIG IO?) maintain the TF Serving image with TensorFlow IO, but I hope my solution helps someone who needs this.

jeongukjae avatar Jan 17 '23 17:01 jeongukjae

Hi everyone. I tried to build TensorFlow Serving with an S3 filesystem implemented in TensorFlow IO, and it was successful.

Here are the codes to build them and docker images.

Codes: https://github.com/jeongukjae/tf-serving-s3 Docker image: https://github.com/jeongukjae/tf-serving-s3/pkgs/container/tf-serving-s3

Docker image size is 465MB, which is a little bit bigger than the official image. (tensorflow/serving:2.11.0 is 459MB)

Maybe the optimal solution is that the TensorFlow team or community (SIG IO?) maintain the TF Serving image with TensorFlow IO, but I hope my solution helps someone who needs this.

For anyone coming here looking for a solution, this excellent piece of work worked for us. I cannot thank @jeongukjae enough for this! We are now serving models in EKS pulling from AWS S3.

If you try this, be aware that Bazel build parameters in the Dockerfile may need to be adjusted depending on resources of the host machine. Reducing default number of parallel jobs and restricting memory and CPU. These values enabled me to build locally on a Macbook Pro 2019 and in our Jenkins-based CI/CD, without failures due to OOMs:

    --noshow_progress \
    --jobs=4 \
    --local_ram_resources=HOST_RAM*.5 \
    --local_cpu_resources=HOST_CPUS-2 \

adriangay avatar Mar 13 '24 15:03 adriangay