Gaurav Parmar

Results 151 comments of Gaurav Parmar

@tfriedel @nldhuyen0047 @onepeachbiubiubiu @ClemensSchwarke Could you share your exact `accelerate config` file and the training command you use for training? I can perhaps help debugging if I can recreate the...

Yeah, I am slightly surprised by how big of a role mixed-precision training plays as well. I am open to any PR or suggestions about improving mixed-precision training.

This bug wasnt there in the model trained for the results in the paper. I think this got introduced when I was cleaning up the repo. -Gaurav

Hi @nldhuyen0047, Could you try removing `--mixed_precision "bf16"` from the training command? -Gaurav

I looked into the training issue today. The culprit seems to be mixed precision training. I think you will obtain much better results once you disable mixed-precision training. Cheers, Gaurav

In my experiments, I found that the model can converge even without the skip connections. Could you specify more details about your task, and what setting you are trying? -Gaurav

Thanks for pointing this out! I removed the `triton` dependency from the `environment.yaml` and `requirements.txt` files. Were you able to resolve your issues after removing this dependency? -Gaurav

Oooh, good catch! The docs should be updated now!

Hi, We used different data samples for the different tasks (based on the available datasets). You can typically see reasonable results within a few thousand steps. Monitoring the evaluation metrics...

We have a separate evaluation procedure for the paired and unpaired models. Which one were you trying to evaluate? -Gaurav