TensorRT-LLM issues

Bus error running t5 conversion script using the latest main

1

### System Info GPU (a10g). I have tried with an AWS g5.2xlarge instance and AWS g5.12xlarge instance. ### Who can help? @byshiue ### Information - [X] The official example scripts...

sc-gr

bug

Why the flops num is higher than standard specification ?

2

### System Info H20 * 1 ### Who can help? _No response_ ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks...

YiandLi

question

triaged

neeed more info

zephyr-7b-beta fp16 engine outputs "\u68a6\u68a6\u68a6..." for long input ~7000 tokens

7

### System Info GPU: NVIDIA A100 Driver Version: 545.23.08 CUDA: 12.3 versions: - https://github.com/NVIDIA/TensorRT-LLM.git (71d8d4d) - https://github.com/triton-inference-server/tensorrtllm_backend.git (bf5e900) Model: zephyr-7b-beta ### Who can help? @kaiyux @byshiue ### Information - [X]...

Hao-YunDeng

bug

[Feature Request] llama v3 support

32

### System Info llama3 released https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6 https://github.com/meta-llama/llama3 ### Who can help? @ncomly-nvidia ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks...

gulldan

bug

triaged

feature request

Chunked context incomplete outputs

3

### System Info CPU architecture: x86_64 Host RAM: 1TB GPU: 8xH100 SXM Container: Manually built container with TRT 9.3 Dockerfile.trt_llm_backend (nvcr.io/nvidia/tritonserver:24.03-trtllm-python-py3 doesn't work for TRT LLM main branch?) TRT LLM...

siddhatiwari

bug

triaged

KeyError: 'model.layers.0.self_attn.q_proj.qweight'

3

python3 convert_checkpoint.py --model_dir /workspace/lk/model/Qwen/14B --output_dir ./tllm_checkpoint_1gpu_gptq --dtype float16 --use_weight_only --weight_only_precision int4_gptq --per_group [TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024042300 0.10.0.dev2024042300 Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02

LIUKAI0815

triaged

Update perf-best-practices.md

Update some dead links

sam-india-007

Mixtral 8x7B smoothquant failed

2

### System Info p4de (4 80GB A100 GPUs) ### Who can help? @Tracin @byshiue ### Information - [X] The official example scripts - [ ] My own modified scripts ###...

vnkc1

bug

What is the best way to get a sub-tensor without data copy?

Before the attention operation the qkv tensors are implemented as one big tensor `qkv`, I would like to do some in-place operations for q and k only. Currently what I...

dongluw

NotImplementedError: Cannot copy out of meta tensor; no data

1

python convert_checkpoint.py --model_dir /workspace/lk/model/Qwen/14B/ --output_dir ./tllm_checkpoint_1gpu_fp16_wq --dtype float16 --use_weight_only --weight_only_precision int8 [TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024042300 0.10.0.dev2024042300 Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02

LIUKAI0815

triaged

TensorRT-LLM
TensorRT-LLM copied to clipboard

Metadata

Bus error running t5 conversion script using the latest main

Why the flops num is higher than standard specification ?

zephyr-7b-beta fp16 engine outputs "\u68a6\u68a6\u68a6..." for long input ~7000 tokens

[Feature Request] llama v3 support

Chunked context incomplete outputs

KeyError: 'model.layers.0.self_attn.q_proj.qweight'

Update perf-best-practices.md

Mixtral 8x7B smoothquant failed

What is the best way to get a sub-tensor without data copy?

NotImplementedError: Cannot copy out of meta tensor; no data

← Metadata

Owner

Metadata

TensorRT-LLM TensorRT-LLM copied to clipboard

Metadata

← Metadata

Owner

Metadata

TensorRT-LLM
TensorRT-LLM copied to clipboard