Jason Ng issues

Results 5 issues of


                                            Jason Ng

OpenAI Agent Not Function Calling

### Checked other resources - [X] I added a very descriptive title to this issue. - [X] I searched the LangChain documentation with the integrated search. - [X] I used...

Experiments with Alphanumeric Entities

Hi there! Thank you for the wonderful work done as this greatly reduced the memory overhead and increased inference time for my use case. I noticed that the prompt compression...

question

Model 'tensorrt_llm' loading failed with error: key 'use_context_fmha_for_generation' not found

**Description** Unable to run triton inference server with tensorrt-llm for Llama3-ChatQA-1.5-8B **Triton Information** v2.46.0 Are you using the Triton container or did you build it yourself? Using Triton container image...

TensorRT backend gives no output

Hi, I have built a TensoRT engine and tried running the command: ``` python3 run_server.py -p 9090 -b tensorrt -trt {path_to_engine} ``` but the only output that I have received...

Stark Difference in GPU Usage of Triton Servers with Llama3 and Llama3.1 models

**Description** I have noticed that there was a huge difference in memory usage for runtime buffers and decoder for llama3 and llama3.1. **Triton Information** What version of Triton are you...