Kris Hung comments

Results 101 comments of


                                            Kris Hung

Tritonserver Physical RAM Grow Overtime

Hi @apokerce, would be great if you could confirm if OOM still happens. From my end, using the http client doesn't introduce any memory growth: ![image](https://github.com/triton-inference-server/server/assets/43719498/6f19a772-4446-4067-9463-ff1024696cdd) For using grpc client...

Error locating b64/decode.h while building perf analyzer in tritonclient

Closing due to lack of activity. Please re-open the issue if you would like to follow up with this issue.

How to specify the TensorRT version in Triton Server for inference?

Hi @Gcstk, thanks for bringing this up. There will be some API changes and fixes needed if you'd like to compile the TRT backend with TRT 10. I'd recommend waiting...

Change TensorRT-LLM

@mc-nv I think the changes in this PR are merged within other previous PR. Should we close this one?

error running simple example

It looks like your GPU doesn't support peer-to-peer access. Could you run `nvidia-smi topo -m` to see if that's the case? I did have a similar issue before where my...

error running simple example

@geraldstanje I think it might also require nvlinks for p2p access - not sure about this part, should have more clarification from the TRT-LLM GitHub channel. From my experience, I...

error running simple example

@geraldstanje Sure thing! I'm using the command in the [README](https://github.com/triton-inference-server/tensorrtllm_backend/tree/main?tab=readme-ov-file#prepare-tensorrt-llm-engines) as example. Basically just adding the last line when building engines: ``` # Build TensorRT engines trtllm-build --checkpoint_dir ./c-model/gpt2/fp16/4-gpu \...

Kris Hung

Tritonserver Physical RAM Grow Overtime

Error locating b64/decode.h while building perf analyzer in tritonclient

How to specify the TensorRT version in Triton Server for inference?

Change TensorRT-LLM

error running simple example

error running simple example

error running simple example

Add `interceptors` parameter passing for `tritonclient.grpc(.aio).InferenceServerClient`

failed to open CUDA IPC handle: invalid argument

failed to open CUDA IPC handle: invalid argument