TensorRT-LLM NVIDIA AMMO documentation

Is there any official documentation of NVIDIA AMMO toolkit? If so, where is it?

In particular, I'd be interested in documentation about:

implemented features
supported quantization techniques for each model type
changelog between versions

@Tracin @juney-nvidia

Mar 28 '24 13:03 fedem96

@RalphMao do you have any comments on this ask? :)

Apr 01 '24 07:04 juney-nvidia

Same. How I can find source code of this library? I want to write custom quantization pipeline for encoder-decoder models like T5.

Apr 17 '24 15:04 dmitrymailk

same question here.

May 07 '24 07:05 yao-matrix

x2

May 09 '24 05:05 puppetm4st3r

Hi folks! Are there updates on the docs?

May 17 '24 08:05 ChristianPala

+1

May 17 '24 10:05 lix19937

Hi all, thank you for your interest. The AMMO toolkit has been renamed to "TensorRT model optimizer" and the documentation is available at https://nvidia.github.io/TensorRT-Model-Optimizer/ . Examples related with Model Optimizer is available at https://github.com/NVIDIA/TensorRT-Model-Optimizer?tab=readme-ov-file

May 17 '24 20:05 RalphMao

Same. How I can find source code of this library? I want to write custom quantization pipeline for encoder-decoder models like T5.

The library is available on PyPi, with source open (instead of open source). You can access most of the files but some files doesn't have the approval for open source (yet)

May 17 '24 20:05 RalphMao

Hi @dmitrymailk , I am also exploring ways to run 4bit quantized encoder - decoder model in tensorrt-llm. Where you able to make any progress on that front ?

May 20 '24 02:05 ashwin-js

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

Jun 20 '24 01:06 github-actions[bot]

TensorRT-LLM TensorRT-LLM copied to clipboard

NVIDIA AMMO documentation

TensorRT-LLM
TensorRT-LLM copied to clipboard