Manickavela
Manickavela
yes, it is functional in my local testing with CUDA provider, when I raised PR, observed that for some android build, it was failing with '#include "asio"', so wanted to...
I have moved the WebSocket file to python/src object file and cleaned it, let me know if it is good enough
Thanks for the suggestions, I will address them this week.
facing similar issue comparing triton-server with vLLM and TRT-LLM backend. with 24.07 one observation made with --log-verbose=1 with triton-server running with 100 concurrency, but observing that Generation/Scheduled requests, is 5...
could tokenizer or another component of the stack be a bottlneck? similar to https://github.com/triton-inference-server/server/issues/6894 ?
Facing the same issue with 0.11.0 of tensorrt_llm_backend