bigscience issues

What is the number of epochs of the final training?

The [config file](https://github.com/bigscience-workshop/bigscience/blob/b4a4f4651771cb78297abe5074aaf2de1f92d6ce/train/tr11-176B-ml/setup-test-n2.slurm) lists the sample count of the dataset as 220M and a global batch size of 2048, which equates to ~107K steps per epoch. The [main README](https://huggingface.co/bigscience/bloom/blob/main/README.md) says...

cmsflash

Add eval files

2

A clean-up to avoid slurm scripts in the Meg-DS repo. I will clean up the Meg-DS repo & the evaluation-results repo if we merge this.

Muennighoff

"added by us" placement

It's more common and easier to follow to put the participial phrase after the noun, I think.

EIFY

Add t0 scripts

3

Notes: RE: Learning Rate T0 & FLAN use Adafactor which automatically adjusts the step size: [Finally, while the learning rate in Adam denotes a target absolute step size, we follow...

Muennighoff

[WIP] OPT experimentations

thomasw21

Interactive generation script

4

Add small arguments that are accepted by accelerate for better performance in the previous script we were offloading to the disk which takes a lot of time cc @Muennighoff

younesbelkada

Add small models continuation scripts

This PR updates the small model SLURM scripts with the ones used to finish their training. We could also make these separate files / mark somewhere that we continued with...

Muennighoff

bigscience
bigscience copied to clipboard

Metadata

What is the number of epochs of the final training?

Add eval files

"added by us" placement

[WIP] tr13 evaluation: T0 eval + XNLI

Add bslmeval scripts

Add p3t0 scripts

Add t0 scripts

[WIP] OPT experimentations

Interactive generation script

Add small models continuation scripts

← Metadata

Owner

Metadata

bigscience bigscience copied to clipboard

Metadata

← Metadata

Owner

Metadata

bigscience
bigscience copied to clipboard