biaochen

Results 6 issues of biaochen

select is used as the I/O multiplexing tool, can it be changed with poll or epoll

I want to use tf-trt to optimize a tf2 model, and then serve with triton. But fail to serve the optimized tf-trt model. Following is the process: 1. following this...

### System Info x86_64 V100 triton server image: nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3 tensorrtllm_backend: v0.7.1 ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My own...

bug
triaged

I've tested speculative decoding feature using llama3 models; I convert draft/target model to trt engine, and launch triton server with bls model, but there seems no performance gain. environment settings:...

### Checklist - [ ] 1. I have searched related issues but cannot get the expected help. - [ ] 2. The bug has not been fixed in the latest...

Hi Team, I'm testing speculative decoding feature with trtllm, but meet some issue. Following is my settings: hardware: A100 80G software: nvcr.io/nvidia/tritonserver:25.01-trtllm-python-py3 model: gemma-2-2b-it / gemma-2-27b-it ``` cd /llm/tmp/trtllm/v0.17/TensorRT-LLM/examples/gemma/ ```...

triaged
stale
waiting for feedback