Gaurav Parmar
Gaurav Parmar
@tfriedel @nldhuyen0047 @onepeachbiubiubiu @ClemensSchwarke Could you share your exact `accelerate config` file and the training command you use for training? I can perhaps help debugging if I can recreate the...
Yeah, I am slightly surprised by how big of a role mixed-precision training plays as well. I am open to any PR or suggestions about improving mixed-precision training.
This bug wasnt there in the model trained for the results in the paper. I think this got introduced when I was cleaning up the repo. -Gaurav
Hi @nldhuyen0047, Could you try removing `--mixed_precision "bf16"` from the training command? -Gaurav
I looked into the training issue today. The culprit seems to be mixed precision training. I think you will obtain much better results once you disable mixed-precision training. Cheers, Gaurav
In my experiments, I found that the model can converge even without the skip connections. Could you specify more details about your task, and what setting you are trying? -Gaurav
Thanks for pointing this out! I removed the `triton` dependency from the `environment.yaml` and `requirements.txt` files. Were you able to resolve your issues after removing this dependency? -Gaurav
Oooh, good catch! The docs should be updated now!
Hi, We used different data samples for the different tasks (based on the available datasets). You can typically see reasonable results within a few thousand steps. Monitoring the evaluation metrics...
We have a separate evaluation procedure for the paired and unpaired models. Which one were you trying to evaluate? -Gaurav