redis-nvidia-recsys
redis-nvidia-recsys copied to clipboard
gRPC error when running the code examples on CPU.
I was following the notebooks to run the examples, and everything works fine for me when running the Deploying Online Multi-Stage RecSys with Triton Inference Server notebook on GPU.
instance_group [
{
count: 1
kind: KIND_CPU
}
]
However, after I modified the model configuration file to let the ensemble (specifically the user-embedding model and ranking model) run on CPUs, I ran into this error
!python client.py --user 12
[/workspace/online-multi-stage-recsys](https://file+.vscode-resource.vscode-cdn.net/workspace/online-multi-stage-recsys)
Finding recommendations for User 12
Traceback (most recent call last):
File "/workspace/online-multi-stage-recsys/client.py", line 53, in <module>
results = triton_client.infer(model_name=args.model_name,
File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 1361, in infer
raise_error_grpc(rpc_error)
File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_utils.py", line 65, in raise_error_grpc
raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] in ensemble 'ensemble-model', indices[0] = 3 is not in [0, 3)
[[{{function_node __inference__wrapped_model_203518}}{{node encoder/parallel_block/embeddings/user_is_occupied/user_is_occupied/embedding_lookup}}]]
Does anyone know what the problem is and how to fix it?
Thanks!
Not sure, this may be a version issue or something with the triton client. I'll ping the Triton team
[StatusCode.INTERNAL] in ensemble 'ensemble-model', indices[0] = 3 is not in [0, 3)
This looks like a model in the pipeline generated an index beyond the acceptable range. Can you verify whether the client is sending valid data? Is the error data-specific? The issue is most likely coming from the models and computations within it. There might be some small discrepancies when running operations on GPU vs CPU which might be cascading into this error.
Hi Tanmay and Sam,
Thanks for getting back to me! The client is sending valid data. The models that are causing this issue are the user embedding model https://github.com/RedisVentures/Redis-Recsys/tree/master/online-multi-stage-recsys/models/1-user-embeddings and the ranking model https://github.com/RedisVentures/Redis-Recsys/tree/master/online-multi-stage-recsys/models/5-ranking, which are both TensorFlow savedmodels.
Thanks, Zhanqiu
On Mon, Nov 13, 2023 at 5:22 PM Tanmay Verma @.***> wrote:
[StatusCode.INTERNAL] in ensemble 'ensemble-model', indices[0] = 3 is not in [0, 3)
This looks like a model in the pipeline generated an index beyond the acceptable range. Can you verify whether the client is sending valid data? The issue is most likely coming from the models and computations within it. There might be some small discrepancies when running operations on GPU vs CPU which might be cascading into this error.
— Reply to this email directly, view it on GitHub https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FRedisVentures%2FRedis-Recsys%2Fissues%2F8%23issuecomment-1809229680&data=05%7C01%7Czh338%40g.cornell.edu%7C0989b93b045842ff264308dbe496fd7d%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638355109384023357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=UIzjuRqNKI4WFDcP2b4WPMpbLDMOZaBTKKyykGIZ3Bk%3D&reserved=0, or unsubscribe https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAL2ZKJR5JIOYWHM4PQ3KAP3YEKMRNAVCNFSM6AAAAAA7DOKZSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBZGIZDSNRYGA&data=05%7C01%7Czh338%40g.cornell.edu%7C0989b93b045842ff264308dbe496fd7d%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638355109384023357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9oJsA5s12MWPGsSRugd2McmT%2B%2F2JQ5e5y4SO8j7ysvg%3D&reserved=0 . You are receiving this because you authored the thread.Message ID: @.***>