DeepSeek-Coder icon indicating copy to clipboard operation
DeepSeek-Coder copied to clipboard

TensorRT Quantization Breaks for `LlamaLinearScalingRotaryEmbedding`

Open Sanger2000 opened this issue 2 years ago • 0 comments

In nvidia-ammo, it appears these lines in ammo/torch/export/layer_utils.py have an unexpected failure for some Llama variants:

Screen Shot 2024-02-10 at 11 12 23 PM

In particular, the deepseek models use LlamaLinearScalingRotaryEmbedding. This means the module is picked up by the is_linear check, and is treated as the dense case. However, there is no .weight for this module, so the build_linear_config fails.

Lots of easy fixes for this (for example, just checking if "Rotary" in name and skipping that case), happy to contribute (but don't think there is an OSS repo to do so)

Sanger2000 avatar Feb 11 '24 07:02 Sanger2000