handson-ml3 icon indicating copy to clipboard operation
handson-ml3 copied to clipboard

19_training_and_deploying_at_scale.ipynb error

Open BuggieCoder opened this issue 2 years ago • 10 comments

I tried to run the subject notebook in Colab and received the below error at [this section]. (https://colab.research.google.com/github/ageron/handson-ml3/blob/main/19_training_and_deploying_at_scale.ipynb#scrollTo=Querying_TF_Serving_through_the_REST_API). Please help. Thanks.

Codes caused the error:

`import requests

server_url = "http://localhost:8501/v1/models/my_mnist_model:predict" response = requests.post(server_url, data=request_json) response.raise_for_status() # raise an exception in case of error response = response.json()`

Error messages:

ConnectionRefusedError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/urllib3/connection.py in _new_conn(self) 158 conn = connection.create_connection( --> 159 (self._dns_host, self.port), self.timeout, **extra_kw) 160

19 frames ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

NewConnectionError Traceback (most recent call last) NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f129a2f4550>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

MaxRetryError Traceback (most recent call last) MaxRetryError: HTTPConnectionPool(host='localhost', port=8501): Max retries exceeded with url: /v1/models/my_mnist_model:predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f129a2f4550>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

ConnectionError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 514 raise SSLError(e, request=request) 515 --> 516 raise ConnectionError(e, request=request) 517 518 except ClosedPoolError as e:

ConnectionError: HTTPConnectionPool(host='localhost', port=8501): Max retries exceeded with url: /v1/models/my_mnist_model:predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f129a2f4550>: Failed to establish a new connection: [Errno 111] Connection refused'))

BuggieCoder avatar Jul 04 '22 05:07 BuggieCoder

Hi @BuggieCoder, you're right. I've got the same error when running the notebook in Colab. A quick look at the server's log suggests that the problem might be in the older version of the supporting libraries on the host machine compared to the ones required by tensorflow_model_server 2.9.0. You can check it yourself by running:

with open('my_server.log') as f:
  print(f.read())

I got the following result:

tensorflow_model_server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found (required by tensorflow_model_server)
tensorflow_model_server: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by tensorflow_model_server)
tensorflow_model_server: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by tensorflow_model_server)

To fix it, replace the following snippet in the notebook:

if "google.colab" in sys.modules or "kaggle_secrets" in sys.modules:
    url = "https://storage.googleapis.com/tensorflow-serving-apt"
    src = "stable tensorflow-model-server tensorflow-model-server-universal"
    !echo 'deb {url} {src}' > /etc/apt/sources.list.d/tensorflow-serving.list
    !curl '{url}/tensorflow-serving.release.pub.gpg' | apt-key add -
    !apt update -q && apt-get install -y tensorflow-model-server
    %pip install -q -U tensorflow-serving-api

with

if "google.colab" in sys.modules or "kaggle_secrets" in sys.modules:
    !wget 'https://storage.googleapis.com/tensorflow-serving-apt/pool/tensorflow-model-server-2.5.4/t/tensorflow-model-server/tensorflow-model-server_2.5.4_all.deb'
    !dpkg -i tensorflow-model-server_2.5.4_all.deb
    %pip install -q -U tensorflow-serving-api

The more recent versions (can be found here) produce the same error as above.

vi3itor avatar Jul 04 '22 10:07 vi3itor

Hi @BuggieCoder , Thanks for your feedback. It looks like the official installation instructions for TensorFlow Serving currently do not work on Google Colab, but luckily the workaround proposed by @vi3itor (thanks! 🙏) works fine. It looks like TensorFlow Serving assumes that some recent libraries are present, but Google Colab does not have them yet. Hopefully this will be fixed the next time the Google Colab runtime is updated. I'll update the notebook to point to this issue.

ageron avatar Jul 05 '22 06:07 ageron

@ageron, I checked that the latest working version is 2.5.4 (I got the same error when tried installing 2.6.5). I'll edit the code snippet above.

vi3itor avatar Jul 05 '22 08:07 vi3itor

Thank you so much for your response. I tried the replacement codes and that error has gone. But now I got another error.

BuggieCoder avatar Jul 06 '22 05:07 BuggieCoder

The new error is from this block of code: `import grpc from tensorflow_serving.apis import prediction_service_pb2_grpc

channel = grpc.insecure_channel('localhost:8500') predict_service = prediction_service_pb2_grpc.PredictionServiceStub(channel) response = predict_service.Predict(request, timeout=10.0)`

BuggieCoder avatar Jul 06 '22 05:07 BuggieCoder

and here is the error message:

InactiveRpcError Traceback (most recent call last) in () 4 channel = grpc.insecure_channel('localhost:8500') 5 predict_service = prediction_service_pb2_grpc.PredictionServiceStub(channel) ----> 6 response = predict_service.Predict(request, timeout=10.0)

1 frames /usr/local/lib/python3.7/dist-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline) 847 return state.response 848 else: --> 849 raise _InactiveRpcError(state) 850 851

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.INVALID_ARGUMENT details = "input tensor alias not found in signature: flatten_input. Inputs expected to be in the set {flatten_1_input}." debug_error_string = "{"created":"@1656964752.590421454","description":"Error received from peer ipv4:127.0.0.1:8500","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"input tensor alias not found in signature: flatten_input. Inputs expected to be in the set {flatten_1_input}.","grpc_status":3}"

BuggieCoder avatar Jul 06 '22 05:07 BuggieCoder

Hi @BuggieCoder, you're welcome!

I'm not getting the error above. Everything should work fine. Are you running the code from Colab or have you made some changes?

vi3itor avatar Jul 06 '22 08:07 vi3itor

Thanks. I am running the code from Colab. Let me try it one more time.

BuggieCoder avatar Jul 09 '22 19:07 BuggieCoder

When I ran up to this block of code: `from google.cloud import aiplatform

server_image = "gcr.io/cloud-aiplatform/prediction/tf2-gpu.2-8:latest"

aiplatform.init(project=project_id, location=location) mnist_model = aiplatform.Model.upload( display_name="mnist", artifact_uri=f"gs://my_fashion_model/my_mnist_model/0002", serving_container_image_uri=server_image, )`

I got the error below:


ContextualVersionConflict Traceback (most recent call last) in () ----> 1 from google.cloud import aiplatform 2 3 server_image = "gcr.io/cloud-aiplatform/prediction/tf2-gpu.2-8:latest" 4 5 aiplatform.init(project=project_id, location=location)

11 frames /usr/local/lib/python3.7/dist-packages/pkg_resources/init.py in resolve(self, requirements, env, installer, replace_conflicting, extras) 775 # Oops, the "best" so far conflicts with a dependency 776 dependent_req = required_by[req] --> 777 raise VersionConflict(dist, req).with_context(dependent_req) 778 779 # push the new requirements onto the stack

ContextualVersionConflict: (protobuf 3.17.3 (/usr/local/lib/python3.7/dist-packages), Requirement.parse('protobuf<4.0.0dev,>=3.19.0'), {'google-cloud-aiplatform', 'google-cloud-resource-manager'})

Please see if you can help. Thanks!

BuggieCoder avatar Jul 11 '22 00:07 BuggieCoder

Hi @BuggieCoder,

ContextVersionConflict tells you that you need to Restart the Runtime to activate the packages you installed. In particular, when you run cell number 3 with the following code block:

if "google.colab" in sys.modules or "kaggle_secrets" in sys.modules:
    %pip install -q -U google-cloud-aiplatform

you might have noticed the warning from @ageron:

  • Warning: On Colab, you must restart the Runtime after the installation, and continue with the next cells.

So you should select from the top menu "Runtime -> Restart runtime". But start running the cells from the beginning, otherwise, you'll meet some errors that either tf or sys models are not imported.

vi3itor avatar Jul 11 '22 05:07 vi3itor