redis-nvidia-recsys icon indicating copy to clipboard operation
redis-nvidia-recsys copied to clipboard

gRPC error when running the code examples on CPU.

Open ZhanqiuHu opened this issue 2 years ago • 3 comments

I was following the notebooks to run the examples, and everything works fine for me when running the Deploying Online Multi-Stage RecSys with Triton Inference Server notebook on GPU.

instance_group [
  {
    count: 1
    kind: KIND_CPU
  }
]

However, after I modified the model configuration file to let the ensemble (specifically the user-embedding model and ranking model) run on CPUs, I ran into this error

!python client.py --user 12

[/workspace/online-multi-stage-recsys](https://file+.vscode-resource.vscode-cdn.net/workspace/online-multi-stage-recsys)
Finding recommendations for User 12
Traceback (most recent call last):
  File "/workspace/online-multi-stage-recsys/client.py", line 53, in <module>
    results = triton_client.infer(model_name=args.model_name,
  File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 1361, in infer
    raise_error_grpc(rpc_error)
  File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_utils.py", line 65, in raise_error_grpc
    raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] in ensemble 'ensemble-model', indices[0] = 3 is not in [0, 3)
	 [[{{function_node __inference__wrapped_model_203518}}{{node encoder/parallel_block/embeddings/user_is_occupied/user_is_occupied/embedding_lookup}}]]

Does anyone know what the problem is and how to fix it?

Thanks!

ZhanqiuHu avatar Nov 08 '23 20:11 ZhanqiuHu

Not sure, this may be a version issue or something with the triton client. I'll ping the Triton team

Spartee avatar Nov 08 '23 21:11 Spartee

[StatusCode.INTERNAL] in ensemble 'ensemble-model', indices[0] = 3 is not in [0, 3)

This looks like a model in the pipeline generated an index beyond the acceptable range. Can you verify whether the client is sending valid data? Is the error data-specific? The issue is most likely coming from the models and computations within it. There might be some small discrepancies when running operations on GPU vs CPU which might be cascading into this error.

tanmayv25 avatar Nov 13 '23 22:11 tanmayv25

Hi Tanmay and Sam,

Thanks for getting back to me! The client is sending valid data. The models that are causing this issue are the user embedding model https://github.com/RedisVentures/Redis-Recsys/tree/master/online-multi-stage-recsys/models/1-user-embeddings and the ranking model https://github.com/RedisVentures/Redis-Recsys/tree/master/online-multi-stage-recsys/models/5-ranking, which are both TensorFlow savedmodels.

Thanks, Zhanqiu

On Mon, Nov 13, 2023 at 5:22 PM Tanmay Verma @.***> wrote:

[StatusCode.INTERNAL] in ensemble 'ensemble-model', indices[0] = 3 is not in [0, 3)

This looks like a model in the pipeline generated an index beyond the acceptable range. Can you verify whether the client is sending valid data? The issue is most likely coming from the models and computations within it. There might be some small discrepancies when running operations on GPU vs CPU which might be cascading into this error.

— Reply to this email directly, view it on GitHub https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FRedisVentures%2FRedis-Recsys%2Fissues%2F8%23issuecomment-1809229680&data=05%7C01%7Czh338%40g.cornell.edu%7C0989b93b045842ff264308dbe496fd7d%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638355109384023357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=UIzjuRqNKI4WFDcP2b4WPMpbLDMOZaBTKKyykGIZ3Bk%3D&reserved=0, or unsubscribe https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAL2ZKJR5JIOYWHM4PQ3KAP3YEKMRNAVCNFSM6AAAAAA7DOKZSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBZGIZDSNRYGA&data=05%7C01%7Czh338%40g.cornell.edu%7C0989b93b045842ff264308dbe496fd7d%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638355109384023357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9oJsA5s12MWPGsSRugd2McmT%2B%2F2JQ5e5y4SO8j7ysvg%3D&reserved=0 . You are receiving this because you authored the thread.Message ID: @.***>

ZhanqiuHu avatar Nov 13 '23 23:11 ZhanqiuHu