poet
poet copied to clipboard
Evaluation of PoET in distributed training mode
Reported by @3bsamad in #10
Currently when training PoET in distributed training mode, it seems that the evaluation is only based on the data used by GPU 1, i.e. 1/n of the dataset. Possible solution might be using Hugging Face Accelerate.