bigscience issues

Results 28 bigscience issues

Sort by recently updated

Running Bloom

What kind of machine is required to just run the inference on the 176B model? https://huggingface.co/bigscience/bloom

Zero_Stage=1 results in higher TFLOPS?

### Description I am learning the chronicles_prequel, and I find the last table in the chapter indicates the higher TFLOPS is achieved with Zero_Stage = 1. [Trying with ZeRO_STAGE=0/1](https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/chronicles-prequel.md#48-node-contenders) Zero_stage=1...

lhb8125

Sharing the 1.3B-Pile@300B model

The 1.3B-Pile@300B model is quite strong: https://docs.google.com/spreadsheets/d/1CI8Q9RCblLRzUOPJ6ViqBmo284-8ojluQ-CmaEuhuv0/edit#gid=1295801165 lambada 0.6088 piqa 0.7160 hellaswag 0.5209 --> these are all better than gpt-neo 1.3B. Could you share the model? Thank you.

BlinkDL

Update LICENSE to full text

Not sure how to fill `Copyright [yyyy] [name of copyright owner]`

cakiki

Is the 13B - unmodified Megatron gpt2 - baseline available? ( tr1-13B-base)

I was super excited to hear about this project! I was wondering if the model is available anywhere? In the [chronicles of tr1-13B-base](https://github.com/bigscience-workshop/bigscience/blob/master/train/tr1-13B-base/chronicles.md ) it says at the end: "All...

ViktorThink

bigscience
bigscience copied to clipboard

Metadata

Running Bloom

Zero_Stage=1 results in higher TFLOPS?

Sharing the 1.3B-Pile@300B model

Update LICENSE to full text

Is the 13B - unmodified Megatron gpt2 - baseline available? ( tr1-13B-base)

can you share the slurm.conf you are using?

Add extrapolation experiment slurm scripts

Some update to tr10 config

Add tr6g for loss reweighting according to positional loss frequency

mC4 sampling & pre-processing

← Metadata

Owner

Metadata

bigscience bigscience copied to clipboard

Metadata

← Metadata

Owner

Metadata

bigscience
bigscience copied to clipboard