Edd comments

Repositories
Issues
Comments

Results 22 comments of

Edd

Continual Pretraining: Unexpected Trainable Parameters in PEFT Model

I think there's no problem in the code Gemma2, Llama3.2, and Qwen has huge amount of vocab size. Therefore the `embedding` and `lm_head` layer is very huge When doing `CPT`,...

Continual Pretraining: Unexpected Trainable Parameters in PEFT Model

I am not sure exactly why we need to saves both the `original_module` and `modules_to_save`? I guess because when you doing LoRA, you can't just push gradient to the same...