Jason Ng

Results 5 issues of Jason Ng

### Checked other resources - [X] I added a very descriptive title to this issue. - [X] I searched the LangChain documentation with the integrated search. - [X] I used...

Hi there! Thank you for the wonderful work done as this greatly reduced the memory overhead and increased inference time for my use case. I noticed that the prompt compression...

question

**Description** Unable to run triton inference server with tensorrt-llm for Llama3-ChatQA-1.5-8B **Triton Information** v2.46.0 Are you using the Triton container or did you build it yourself? Using Triton container image...

Hi, I have built a TensoRT engine and tried running the command: ``` python3 run_server.py -p 9090 -b tensorrt -trt {path_to_engine} ``` but the only output that I have received...

**Description** I have noticed that there was a huge difference in memory usage for runtime buffers and decoder for llama3 and llama3.1. **Triton Information** What version of Triton are you...