TensorRT-LLM issues

QwenVL visual_encoder failure

### System Info [TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024020600[02/16/2024-22:04:57] [TRT-LLM] [I] Loading engine from ./plan/visual_encoder/visual_encoder_fp16.plan [02/16/2024-22:05:00] [TRT-LLM] [I] Creating session from engine ./plan/visual_encoder/visual_encoder_fp16.plan [02/16/2024-22:05:00] [TRT] [I] Loaded engine size: 3714 MiB [02/16/2024-22:05:00]...

peytontolbert

bug

moe router tp removed

As the title suggests, this PR removes TP (tensor parallelism) for MoE router. Duplicating router across GPUs removes an allreduce for each MoE layer. This small change leads to **4-18%...

megha95

Unnecessary assertion in cpp implementation of worldConfig.cpp

1

https://github.com/NVIDIA/TensorRT-LLM/blob/3d56a445e8ebf888e78be638faf6beec0a78f3c2/cpp/tensorrt_llm/runtime/worldConfig.cpp#L74 Hi, I've run into a small bug with the CPP implementation of the runtime code. I am running multi-node inference on Llama2 with pipeline parallelism 2 and tensor parallelism...

noahnisbet

Add weight-only quantization for T5 models

7

## Summary Add weight-only quantization for T5. I've added this to the path loading from binary weights. I do not think the HF weight loading currently works, so I have...

eycheung

benchmarking: docs reference steps that don't exist

1

### System Info - System independent. This issue is re: docs - In the benchmarking page there are multiple references to build.py scripts that don't exist as far as I...

julianmack

bug

Small update to benchmarking build docs to correct usage

To reflect the correct usage as I understand it when you have elevated privileges The suggested change worked for me and the original didn't. Also: looking at the Makefile my...

julianmack

Cannot build Nougat model

### System Info - RTX 4090 - x86_64 GNU/Linux - main branch ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My...

mtenenholtz

bug

Add batch manager static lib for Windows

4

### System Info - CPU architecture: x64 - GPU: RTX 4090 24G - CUDA 12.2 ### Who can help? @byshiue @nc ### Information - [X] The official example scripts -...

MustaphaU

bug

triaged

[Model Requests] Add support for CogVLM

https://github.com/THUDM/CogVLM CogVLM is one of the best models for describing images, much better than qwen vl in my experience. To make image subtitles faster would be a huge gain. Being...

Curlypla

error during tritonserver

[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024020600[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024020600[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024020600free(): invalid pointer [95e079756bc2:03949] *** Process received signal *** [95e079756bc2:03949] Signal: Aborted (6) [95e079756bc2:03949] Signal code: (-6) [95e079756bc2:03949] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f754a216520]...

shashikr2

TensorRT-LLM
TensorRT-LLM copied to clipboard

Metadata

QwenVL visual_encoder failure

moe router tp removed

Unnecessary assertion in cpp implementation of worldConfig.cpp

Add weight-only quantization for T5 models

benchmarking: docs reference steps that don't exist

Small update to benchmarking build docs to correct usage

Cannot build Nougat model

Add batch manager static lib for Windows

[Model Requests] Add support for CogVLM

error during tritonserver

← Metadata

Owner

Metadata

TensorRT-LLM TensorRT-LLM copied to clipboard

Metadata

← Metadata

Owner

Metadata

TensorRT-LLM
TensorRT-LLM copied to clipboard