gushiqiao comments

Results 55 comments of


                                            gushiqiao

LLama3-8B-Instruct fail for TensorRT-LLM

@helloyongyang

Mixtral 8x7b failed on compile with tensorrt-llm

@helloyongyang

HumanEval benchmark producing unreasonably low scores

Is it caused by prompt? https://github.com/ModelTC/llmc/blob/bc9367fb8088e9040cc3d20c8ce7e44c32d95e8c/llmc/eval/eval_code.py#L20C8-L20C9

Static Activation Quantization and Mixed-Precision Quantization Incompatibility

https://github.com/ModelTC/llmc/blob/b0bf39e96a0ce44f74ec9a42729c09f6cd6f893e/configs/quantization/methods/MixPrecision/rtn_w_a_static.yml#L37

Static Activation Quantization and Mixed-Precision Quantization Incompatibility

This setting is deployment-friendly. The previous code structure was somewhat messy, so for now we've opted for a simplified support of 8-bit and 16-bit mixed precision. In theory, all methods...

BUG: Mixed-precision configuration not working with STATIC quantization

https://github.com/ModelTC/llmc/blob/b0bf39e96a0ce44f74ec9a42729c09f6cd6f893e/configs/quantization/methods/MixPrecision/rtn_w_a_static.yml#L37

Two Issues / Bugs with NaiveQuantKVCache Implementation

Thank you for sharing your proposed solution. I believe it is well thought out and valuable. Please feel free to submit a pull request, and I will be happy to...

Quantization information

https://github.com/ModelTC/llmc/blob/50b0da743e90c28b9df4360dbac74d93c6a1e504/llmc/compression/quantization/module_utils.py#L475

SpinQuant merge

Since Spinquant requires training, it may involve significant changes to the code structure. Therefore, we do not plan to merge it into the main branch in the near term. Thank...

SpinQuant merge

I’ll try to allocate some time to support loading pre-trained rotation matrices directly as a feature.