Siddha Tiwari issues

Results 7 issues of


                                            Siddha Tiwari

use_fp8_context_fmha broken outputs

### System Info CPU architecture: x86_64 Host RAM: 1TB GPU: 8xH100 SXM Container: Manually built container with TRT 9.3 Dockerfile.trt_llm_backend TensorRT-LLM version: 0.10.0.dev2024043000 Driver Version: 535.161.07 CUDA Version: 12.2 OS:...

bug

triaged

Chunked context incomplete outputs

### System Info CPU architecture: x86_64 Host RAM: 1TB GPU: 8xH100 SXM Container: Manually built container with TRT 9.3 Dockerfile.trt_llm_backend (nvcr.io/nvidia/tritonserver:24.03-trtllm-python-py3 doesn't work for TRT LLM main branch?) TRT LLM...

bug

triaged

High inference memory usage

If a piper http server comes under heavy load, GPU memory usage can spike up multiple GBs and remain high until the server is stopped. Sometimes requests can get OOM...

Allow unsigned (alg: none) JWT tokens

I'm using the firebase auth emulator for local development which produces unsigned tokens. I'm running the firebase auth emulator and hasura (v1.3.3) locally using docker. It seems that hasura views...

k/enhancement

a/authn

multi_block_mode enable runtime crash

bug

triaged

stale

int8 lower performance than fp16

### System Info CPU architecture: x86_64 Host RAM: 1TB GPU: 8xH100 SXM Container: Manually built container with TRT 9.3 Dockerfile.trt_llm_backend TRT LLM v0.9 main branch (https://github.com/NVIDIA/TensorRT-LLM/commit/850b6fa1e710d25769f2b560d897d2bd424a645e) Driver Version: 535.161.07 CUDA...

bug

triaged

stale

Inference server stalling

### System Info - tensorrtllm_backend built using Dockerfile.trt_llm_backend - main branch tesnorrt llm (0.13.0.dev20240813000) - 8xH100 SXM - Driver Version: 535.129.03 - CUDA Version: 12.5 After roughly 30 seconds of...

bug