heliqi
heliqi
### PR types(PR类型) Backend Serving ### Describe 1. OpenVINO and Option support num_streams 2. Runtime(PaddleBackend and OpenVINOBackend) support Clone , TRT and ORT wait for the next PR 3. Serving...
### PR types(PR类型) Serving The title of this pull request should be `[PR type] Description of this pull request`, e.g `[Model] PP-Matting deployment support` PR的标题应该为 `[PR type] Description of this...
Done: 1. Use PaddlePaddle C API 2. Use the same namespace `triton::backend::paddle` 3. Support the config auto-complete feature and ValidateModelConfig(ValidateOutputs、ValidateInputs) 4. Triton versions are supported from 21.10 to 22.x Undone...
CUDA version: 12.8 flashinfer version: 0.5.2 CUDA ARCH(Compute Capability): 9.0 I reported an error using the following test command ```python python3 benchmarks/flashinfer_benchmark.py --routine BatchPrefillWithPagedKVCacheWrapper --page_size 64 --batch_size 1 --s_qo 2000...