Bhuvanesh Sridharan
Bhuvanesh Sridharan
## What this PR Does: This PR adds support for converting huggingface's distil-whisper model weights to compatible pytorch `.pt` files which can be further used to build TensorRT-LLM engines. ##...
From Section 3.2 in the paper: ``` When determining the relative distance and adding positional information to tokens, StreamingLLM focuses on positions within the cache rather than those in the...
### System Info - X86_64 - RAM: 30 GB - GPU: A10G, VRAM: 23GB - Lib: Tensorrt-LLM v0.9.0 - Container Used: nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3 - Model used: Mistral 7B ### Who can...
Closes #8481 ## TL;DR * Replaces regex parsing with `xml.etree.ElementTree`. * Supports nested Pydantic models, repeated tags → `List`, mixed data types. * Keeps all existing flat-structure behaviour (no breaking...