sparseml
sparseml copied to clipboard
Add support to total batch size argument for transformers transfer learning
Created the NMTrainingArguments that inherits from HF's TrainingArguments. This class allows one to add arguments to the training script and handling potential conflicts with other arguments. In particular, added "total_train_batch_size" and "total_eval_batch_size" and made sure these arguments cannot be used alongside "per_device_train_batch_size" and "per_device_eval_batch_size."
@alexm-nm given how many commits have happened since you set up this PR I would suggest either starting from scratch on new branch or resolving all the conflicts
This PR is ready but needs to be merged by hand. Will create a new PR soon and delete this one when done.
Too much time has passed. Will re-implement if needed