SVD-LLM Can not reproduce the results of Llama-3-8B

Hi, thank you for the great work. I followed the homogeneous compression ratio approach in SVD-LLM and applied a 20% compression to each layer. However, I obtained a perplexity of 14.0406 using 256 randomly sampled 2048 sequence from wikitext2, which is quite different from the reported 11.82. Would you mind sharing the code you used to reproduce the results in the table?

Apr 19 '25 13:04 JeffreyWong20

Hi @JeffreyWong20, I wonder if you reproduced their results. I don't know whether they apply fine-tuning to those results..

May 04 '25 00:05 zhuhanqing

No, I didn't manage to reproduce their results

May 04 '25 10:05 JeffreyWong20

@JeffreyWong20 Same here — I tested it and got perplexities of 14.84 on WikiText2 and 80.84 on C4 for LLaMA3.

When using LLaMA-3, I initially encountered the following issue:

Token indices sequence length is longer than the specified maximum sequence length for this model (299078 > 131072). Running this sequence through the model will result in indexing errors

After fixing the sequence length problem, I was able to obtain the results above.

Jun 01 '25 13:06 dellixx

Hi, @dellixx, may I ask how you run this code on LLaMA3? Cause I have upgraded the transformers version and modified the SVD_LlamaAttention class, but I obtained an extremely bad result, like 1358229.61 on wikitext2. I have no idea where is wrong.

Jul 25 '25 13:07 deadlykitten4

Hi, @dellixx, may I ask how you run this code on LLaMA3? Cause I have upgraded the transformers version and modified the SVD_LlamaAttention class, but I obtained an extremely bad result, like 1358229.61 on wikitext2. I have no idea where is wrong.

Hi, [@deadlykitten4 ], When I was saving the model, I directly used the values of U@V to overwrite the original W and saved it in the Hugging Face format. I noticed that you have been working on ResSVD recently. The concept of error propagation is a very interesting discovery. If you would be open to it, I would greatly appreciate the opportunity to exchange ideas and potentially collaborate on LLM compression.

Jul 31 '25 11:07 dellixx