Mengzhao Chen

Results 7 issues of Mengzhao Chen

Hi, thank you for your great work. Today, I want to do an ablation experience on your work. I just modified the `momentum_growth` funtion. from `y, idx = torch.sort(torch.abs(grad).flatten(), descending=True)`...

I train llama-7b with the following batch size settings: ``` --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 4 \ ``` When training, it consumes about 9G GPU memory. However, when...

Hello, Thanks for your excellent work. I noticed that the code does not support for FLAN v2 dataset. I want to know how to support for FLAN v2 dataset and...

Hello, Thanks for your excellent work. I noticed that the code does not support for FLAN v2 dataset. I want to know how to support for FLAN v2 dataset and...

Hi, Thanks for your interesting work and clear open-source code. I have been trying to test the W4A16 kernel with different quantization group size, and I have found that this...

Hi, Thanks for your outstanding work. I have tested the quantized model using the W4A16 kernel on the WikiText2 datasets. Specially, the WikiText2 validation datasets is split into non-overlapping segments...

Hello, Thanks for your outstanding works. I want to do a compressive comparison of recent quantization methods. Due to that latest lm-eval can obtain higher accuracy than the ones reported...