nguyenhoangthuan99

Results 6 issues of nguyenhoangthuan99

I have successfully converted open pose 25 pose-model (version 1.6.0) to TensorRT but the issue is when I run inference in python, the post-process is very slow. I used post-processing...

Currently the example server for cortex.llamacpp and cortex.tensorrtllm can get the following resuls: With avg contex length 400: - cortex.llamacpp: 850 token/s - cortex.tensorrt-llm: 1450 token/s We need to benchmark...

P2: enhancement

# Add Multi-GPU Support for LlamaCpp Engine ## Description We need to implement multi-GPU support for our LlamaCpp wrapper engine to improve performance and allow users to utilize multiple GPUs...

We need create unitest for done ticket. For now we will use [Gtest](https://github.com/google/googletest) to write unitest Unitest can run locally and add to CI pipeline When building debug mode will...

type: epic
category: tests

### Feature request The current codebase only supports bf16/fp16 training, while we typically apply quantization (int8, int4, fp8, fp4) during model serving to reduce VRAM usage while still maintaining accuracy....