Manickavela comments

Results 37 comments of


                                            Manickavela

refactor and pybind of OnlineWebsocketServer

yes, it is functional in my local testing with CUDA provider, when I raised PR, observed that for some android build, it was failing with '#include "asio"', so wanted to...

refactor and pybind of OnlineWebsocketServer

I have moved the WebSocket file to python/src object file and cleaned it, let me know if it is good enough

refactor and pybind of OnlineWebsocketServer

Thanks for the suggestions, I will address them this week.

low performance at large concurrent requests

facing similar issue comparing triton-server with vLLM and TRT-LLM backend. with 24.07 one observation made with --log-verbose=1 with triton-server running with 100 concurrency, but observing that Generation/Scheduled requests, is 5...

low performance at large concurrent requests

could tokenizer or another component of the stack be a bottlneck? similar to https://github.com/triton-inference-server/server/issues/6894 ?

Invalid argument: ensemble 'ensemble' depends on 'tensorrt_llm' which has no loaded version. Model 'tensorrt_llm' loading failed with error

Facing the same issue with 0.11.0 of tensorrt_llm_backend

Invalid argument: ensemble 'ensemble' depends on 'tensorrt_llm' which has no loaded version. Model 'tensorrt_llm' loading failed with error

this worked, Thanks