Mengzhao Chen issues

Results 7 issues of


                                            Mengzhao Chen

The final sparsity is small than preset.

Hi, thank you for your great work. Today, I want to do an ablation experience on your work. I just modified the `momentum_growth` funtion. from `y, idx = torch.sort(torch.abs(grad).flatten(), descending=True)`...

[Bug] large CUDA memory usage in the evaluation phase

I train llama-7b with the following batch size settings: ``` --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 4 \ ``` When training, it consumes about 9G GPU memory. However, when...

How to support FLAN v2 dataset.

Hello, Thanks for your excellent work. I noticed that the code does not support for FLAN v2 dataset. I want to know how to support for FLAN v2 dataset and...

How to support FLAN v2 dataset.

Hello, Thanks for your excellent work. I noticed that the code does not support for FLAN v2 dataset. I want to know how to support for FLAN v2 dataset and...

W4A16 kernel error when group_size is not 128

Hi, Thanks for your interesting work and clear open-source code. I have been trying to test the W4A16 kernel with different quantization group size， and I have found that this...

How to measure the speedup of W4A16 kernel like Figure 6？

Hi, Thanks for your outstanding work. I have tested the quantized model using the W4A16 kernel on the WikiText2 datasets. Specially, the WikiText2 validation datasets is split into non-overlapping segments...

Request for the Llama-2-13B with AQLM (2x8 scheme)

Hello, Thanks for your outstanding works. I want to do a compressive comparison of recent quantization methods. Due to that latest lm-eval can obtain higher accuracy than the ones reported...