nguyenhoangthuan99 issues

Results 6 issues of


                                            nguyenhoangthuan99

Openpose with TensorRT

I have successfully converted open pose 25 pose-model (version 1.6.0) to TensorRT but the issue is when I run inference in python, the post-process is very slow. I used post-processing...

epic: cortex.cpp benchmark + Backend Infra

Currently the example server for cortex.llamacpp and cortex.tensorrtllm can get the following resuls: With avg contex length 400: - cortex.llamacpp: 850 token/s - cortex.tensorrt-llm: 1450 token/s We need to benchmark...

P2: enhancement

Add Multi-GPU Support for LlamaCpp Engine

# Add Multi-GPU Support for LlamaCpp Engine ## Description We need to implement multi-GPU support for our LlamaCpp wrapper engine to improve performance and allow users to utilize multiple GPUs...

epic: unitest for cortex cpp

We need create unitest for done ticket. For now we will use [Gtest](https://github.com/google/googletest) to write unitest Unitest can run locally and add to CI pipeline When building debug mode will...

type: epic

category: tests

Add requests and response body to log in DEBUG mode

Quantization aware training (QAT) support

### Feature request The current codebase only supports bf16/fp16 training, while we typically apply quantization (int8, int4, fp8, fp4) during model serving to reduce VRAM usage while still maintaining accuracy....