bigscience
bigscience copied to clipboard
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
The [config file](https://github.com/bigscience-workshop/bigscience/blob/b4a4f4651771cb78297abe5074aaf2de1f92d6ce/train/tr11-176B-ml/setup-test-n2.slurm) lists the sample count of the dataset as 220M and a global batch size of 2048, which equates to ~107K steps per epoch. The [main README](https://huggingface.co/bigscience/bloom/blob/main/README.md) says...
A clean-up to avoid slurm scripts in the Meg-DS repo. I will clean up the Meg-DS repo & the evaluation-results repo if we merge this.
It's more common and easier to follow to put the participial phrase after the noun, I think.
Notes: RE: Learning Rate T0 & FLAN use Adafactor which automatically adjusts the step size: [Finally, while the learning rate in Adam denotes a target absolute step size, we follow...
Add small arguments that are accepted by accelerate for better performance in the previous script we were offloading to the disk which takes a lot of time cc @Muennighoff
This PR updates the small model SLURM scripts with the ones used to finish their training. We could also make these separate files / mark somewhere that we continued with...