Ma Xinyin

Results 58 comments of Ma Xinyin

I met the same problem with @xiepuzhao. Is this because the step of knowledge distillation is not included in train_biencoder.py?

@leezythu In my previous experiment, I set gradient_accumulation_steps to 8. However, batch size is a particularly important hyper-parameter for this experiment, and if the gradient accumulation is used, then the...

Hi Sebastian, Thanks for trying our project for pruning LLaMA. After pruning a model, it is imperative to perform post-training before using it for any further application. This is because...

Hi Sebastian, Thanks for your advice! We have modified the readme to make it clear. As for the plan, I have so many deadlines in the coming weeks. So it...

Hi all, We found a huge bug in our pruning code and we are working on it to see the reason and the way to fix it. The repo will...

We have updated the code in https://github.com/horseee/LLM-Pruner. Please refer to the new repo-v-

Hi. It's on our waitlist, but it requires a large amount of time and resources to conduct post-training on the pruned model. Otherwise, the pruned model is poorly functioning. We...

We have updated the evaluation results in https://github.com/horseee/LLM-Pruner. Please refer to the new repo-v-

Paper name/title: DeepCache: Accelerating Diffusion Models for Free Paper link: https://arxiv.org/abs/2312.00858 Code link: https://github.com/horseee/DeepCache

Hi. The new code will be released in around one week 🤠