Mengzhou Xia
Mengzhou Xia
> INFO:tensorflow:out/model.ckpt-0 is not in all_model_checkpoint_paths. Manually adding it. > 242 I0125 21:17:40.027305 139845646956352 checkpoint_management.py:95] out/model.ckp t-0 is not in all_model_checkpoint_paths. Manually adding it. > 243 slurmstepd: error: Job 247071...
Hi, this is great work, and thanks for releasing the code! I found that there is no dropout in the llama models, and I wonder if it is a specific...
**Describe the bug** I adapted my training process from the hugging face trainer.py, so most of my trainer is similar to theirs. My model includes a language model and an...
Hi, Thanks for open sourcing this great work! I have some questions on how to calculate the posterior probability for experts. From [this line](https://github.com/hadasah/btm/blob/main/fairseq/fairseq_cli/ensemble_eval_lm.py#L104), it seems that the expert probabilities...
I am wondering where I can find descriptions of the hybrid pruning method as it is not covered in the original movement pruning paper? Thanks so much!