bigscience icon indicating copy to clipboard operation
bigscience copied to clipboard

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

Results 28 bigscience issues
Sort by recently updated
recently updated
newest added

What kind of machine is required to just run the inference on the 176B model? https://huggingface.co/bigscience/bloom

### Description I am learning the chronicles_prequel, and I find the last table in the chapter indicates the higher TFLOPS is achieved with Zero_Stage = 1. [Trying with ZeRO_STAGE=0/1](https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/chronicles-prequel.md#48-node-contenders) Zero_stage=1...

The 1.3B-Pile@300B model is quite strong: https://docs.google.com/spreadsheets/d/1CI8Q9RCblLRzUOPJ6ViqBmo284-8ojluQ-CmaEuhuv0/edit#gid=1295801165 lambada 0.6088 piqa 0.7160 hellaswag 0.5209 --> these are all better than gpt-neo 1.3B. Could you share the model? Thank you.

Not sure how to fill `Copyright [yyyy] [name of copyright owner]`

I was super excited to hear about this project! I was wondering if the model is available anywhere? In the [chronicles of tr1-13B-base](https://github.com/bigscience-workshop/bigscience/blob/master/train/tr1-13B-base/chronicles.md ) it says at the end: "All...

Hey, pinging @stas00 I'm a researcher from Tel-Aviv University and were thinking about implementing QOS, similar to what you have with the Jean Zay cluster. It would be really helpful...

This PR is for sorting out the tr10-104B config.

Hi @TevenLeScao, I think there are some confusing and broken link in the [mC4 data preprocessing](https://github.com/bigscience-workshop/bigscience/tree/master/data/mc4) section. Can you take a look? Both of the links are broken here, 1....