handson-ml3
handson-ml3 copied to clipboard
19_training_and_deploying_at_scale.ipynb error
I tried to run the subject notebook in Colab and received the below error at [this section]. (https://colab.research.google.com/github/ageron/handson-ml3/blob/main/19_training_and_deploying_at_scale.ipynb#scrollTo=Querying_TF_Serving_through_the_REST_API). Please help. Thanks.
Codes caused the error:
`import requests
server_url = "http://localhost:8501/v1/models/my_mnist_model:predict" response = requests.post(server_url, data=request_json) response.raise_for_status() # raise an exception in case of error response = response.json()`
Error messages:
ConnectionRefusedError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/urllib3/connection.py in _new_conn(self) 158 conn = connection.create_connection( --> 159 (self._dns_host, self.port), self.timeout, **extra_kw) 160
19 frames ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
NewConnectionError Traceback (most recent call last) NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f129a2f4550>: Failed to establish a new connection: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
MaxRetryError Traceback (most recent call last) MaxRetryError: HTTPConnectionPool(host='localhost', port=8501): Max retries exceeded with url: /v1/models/my_mnist_model:predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f129a2f4550>: Failed to establish a new connection: [Errno 111] Connection refused'))
During handling of the above exception, another exception occurred:
ConnectionError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 514 raise SSLError(e, request=request) 515 --> 516 raise ConnectionError(e, request=request) 517 518 except ClosedPoolError as e:
ConnectionError: HTTPConnectionPool(host='localhost', port=8501): Max retries exceeded with url: /v1/models/my_mnist_model:predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f129a2f4550>: Failed to establish a new connection: [Errno 111] Connection refused'))
Hi @BuggieCoder, you're right. I've got the same error when running the notebook in Colab.
A quick look at the server's log suggests that the problem might be in the older version of the supporting libraries on the host machine compared to the ones required by tensorflow_model_server
2.9.0. You can check it yourself by running:
with open('my_server.log') as f:
print(f.read())
I got the following result:
tensorflow_model_server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found (required by tensorflow_model_server)
tensorflow_model_server: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by tensorflow_model_server)
tensorflow_model_server: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by tensorflow_model_server)
To fix it, replace the following snippet in the notebook:
if "google.colab" in sys.modules or "kaggle_secrets" in sys.modules:
url = "https://storage.googleapis.com/tensorflow-serving-apt"
src = "stable tensorflow-model-server tensorflow-model-server-universal"
!echo 'deb {url} {src}' > /etc/apt/sources.list.d/tensorflow-serving.list
!curl '{url}/tensorflow-serving.release.pub.gpg' | apt-key add -
!apt update -q && apt-get install -y tensorflow-model-server
%pip install -q -U tensorflow-serving-api
with
if "google.colab" in sys.modules or "kaggle_secrets" in sys.modules:
!wget 'https://storage.googleapis.com/tensorflow-serving-apt/pool/tensorflow-model-server-2.5.4/t/tensorflow-model-server/tensorflow-model-server_2.5.4_all.deb'
!dpkg -i tensorflow-model-server_2.5.4_all.deb
%pip install -q -U tensorflow-serving-api
The more recent versions (can be found here) produce the same error as above.
Hi @BuggieCoder , Thanks for your feedback. It looks like the official installation instructions for TensorFlow Serving currently do not work on Google Colab, but luckily the workaround proposed by @vi3itor (thanks! 🙏) works fine. It looks like TensorFlow Serving assumes that some recent libraries are present, but Google Colab does not have them yet. Hopefully this will be fixed the next time the Google Colab runtime is updated. I'll update the notebook to point to this issue.
@ageron, I checked that the latest working version is 2.5.4 (I got the same error when tried installing 2.6.5). I'll edit the code snippet above.
Thank you so much for your response. I tried the replacement codes and that error has gone. But now I got another error.
The new error is from this block of code: `import grpc from tensorflow_serving.apis import prediction_service_pb2_grpc
channel = grpc.insecure_channel('localhost:8500') predict_service = prediction_service_pb2_grpc.PredictionServiceStub(channel) response = predict_service.Predict(request, timeout=10.0)`
and here is the error message:
InactiveRpcError Traceback (most recent call last)
1 frames /usr/local/lib/python3.7/dist-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline) 847 return state.response 848 else: --> 849 raise _InactiveRpcError(state) 850 851
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.INVALID_ARGUMENT details = "input tensor alias not found in signature: flatten_input. Inputs expected to be in the set {flatten_1_input}." debug_error_string = "{"created":"@1656964752.590421454","description":"Error received from peer ipv4:127.0.0.1:8500","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"input tensor alias not found in signature: flatten_input. Inputs expected to be in the set {flatten_1_input}.","grpc_status":3}"
Hi @BuggieCoder, you're welcome!
I'm not getting the error above. Everything should work fine. Are you running the code from Colab or have you made some changes?
Thanks. I am running the code from Colab. Let me try it one more time.
When I ran up to this block of code: `from google.cloud import aiplatform
server_image = "gcr.io/cloud-aiplatform/prediction/tf2-gpu.2-8:latest"
aiplatform.init(project=project_id, location=location) mnist_model = aiplatform.Model.upload( display_name="mnist", artifact_uri=f"gs://my_fashion_model/my_mnist_model/0002", serving_container_image_uri=server_image, )`
I got the error below:
ContextualVersionConflict Traceback (most recent call last)
11 frames /usr/local/lib/python3.7/dist-packages/pkg_resources/init.py in resolve(self, requirements, env, installer, replace_conflicting, extras) 775 # Oops, the "best" so far conflicts with a dependency 776 dependent_req = required_by[req] --> 777 raise VersionConflict(dist, req).with_context(dependent_req) 778 779 # push the new requirements onto the stack
ContextualVersionConflict: (protobuf 3.17.3 (/usr/local/lib/python3.7/dist-packages), Requirement.parse('protobuf<4.0.0dev,>=3.19.0'), {'google-cloud-aiplatform', 'google-cloud-resource-manager'})
Please see if you can help. Thanks!
Hi @BuggieCoder,
ContextVersionConflict
tells you that you need to Restart the Runtime to activate the packages you installed. In particular, when you run cell number 3 with the following code block:
if "google.colab" in sys.modules or "kaggle_secrets" in sys.modules:
%pip install -q -U google-cloud-aiplatform
you might have noticed the warning from @ageron:
- Warning: On Colab, you must restart the Runtime after the installation, and continue with the next cells.
So you should select from the top menu "Runtime -> Restart runtime". But start running the cells from the beginning, otherwise, you'll meet some errors that either tf
or sys
models are not imported.