vertex-ai-samples icon indicating copy to clipboard operation
vertex-ai-samples copied to clipboard

Matching Engine ANN: _InactiveRpcError, failed to connect to all addresses

Open RubensZimbres opened this issue 2 years ago • 0 comments

I successfuly developed and deployed a Two Towers model in Vertex AI. Now, I'm using the following notebook to get inference from an Endpoint in Matching Engine: https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/matching_engine/sdk_matching_engine_for_indexing.ipynb

All indexes are successfully created and deployed, peering is connected, firewalls ok, IndexEndpoint with VPC Network also, even Private Google Access is enabled in the network. However, when I'm going to query against the deployed index through the online querying gRPC API (Match service), it returns an _InactiveRpcError error in this line of code:

Test query
response = my_index_endpoint.match(
    deployed_index_id=DEPLOYED_INDEX_ID, queries=test, num_neighbors=NUM_NEIGHBOURS
)
response

Error:

---------------------------------------------------------------------------
_InactiveRpcError                         Traceback (most recent call last)
/tmp/ipykernel_21155/694576194.py in <module>
      1 # Test query
      2 response = my_index_endpoint.match(
----> 3     deployed_index_id=DEPLOYED_INDEX_ID, queries=test, num_neighbors=NUM_NEIGHBOURS
      4 )
      5 

/opt/conda/lib/python3.7/site-packages/google/cloud/aiplatform/matching_engine/matching_engine_index_endpoint.py in match(self, deployed_index_id, queries, num_neighbors)
    850 
    851         # Perform the request
--> 852         response = stub.BatchMatch(batch_request)
    853 
    854         # Wrap the results in MatchNeighbor objects and return

/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
    944         state, call, = self._blocking(request, timeout, metadata, credentials,
    945                                       wait_for_ready, compression)
--> 946         return _end_unary_response_blocking(state, call, False, None)
    947 
    948     def with_call(self,

/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
    847             return state.response
    848     else:
--> 849         raise _InactiveRpcError(state)
    850 
    851 

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses"
	debug_error_string = "{"created":"@1661279261.094948981","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1661279261.094948007","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
>

When I run this code I get errors from grpc:

import grpc
with grpc.insecure_channel('__IP__:10000') as channel:
    grpc.channel_ready_future(channel).start()
Exception in thread Thread-188:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "src/python/grpcio/grpc/_cython/_cygrpc/thread.pyx.pxi", line 53, in grpc._cython.cygrpc._run_with_context._run
  File "/opt/conda/lib/python3.7/site-packages/grpc/_channel.py", line 1392, in _poll_connectivity
    time.time() + 0.2)
  File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 510, in grpc._cython.cygrpc.Channel.watch_connectivity_state
  File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 377, in grpc._cython.cygrpc._watch_connectivity_state
  File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 385, in grpc._cython.cygrpc._watch_connectivity_state
ValueError: Cannot invoke RPC: Channel closed!

and also with secure channel:

TypeError: secure_channel() missing 1 required positional argument: 'credentials'

I'm using a Vertex AI instance in west1 and as long as see, the endpoint is only available in central1. I'm on a Jupyter notebook and this is the only issue. I can't find the solution in GCP Troubleshooting grpc: https://cloud.google.com/endpoints/docs/grpc/troubleshoot-config-deployment

I tried to use match_service.proto +

! python -m grpc_tools.protoc -I=. --proto_path=googleapis --python_out=. --grpc_python_out=. match_service.proto and :

import match_service_pb2
import match_service_pb2_grpc
channel = grpc.insecure_channel("{}:10000".format(DEPLOYED_INDEX_SERVER_IP), options=(('grpc.enable_http_proxy', 0),))
stub = match_service_pb2_grpc.MatchServiceStub(channel)
request = match_service_pb2.MatchRequest()
request.deployed_index_id = DEPLOYED_INDEX_ID
response = stub.Match(request)
response

... and got exactly the same error. Also, ping IP is not responding. Could it be a network problem ?

Any ideas on how to solve this issue?

Thanks in advance

UPDATE: I changed the Vertex AI instance to us-central1 and it worked. However, I ask myself if there is a way to make it work even in region us-west1, because I know it's possible.

RubensZimbres avatar Aug 23 '22 18:08 RubensZimbres