BigScience Workshop

Results 17 repositories owned by BigScience Workshop

bigscience

949
Stars
99
Forks
Watchers

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

biomedical

419
Stars
111
Forks
Watchers

Tools for curating biomedical training data for large-scale language modeling

Megatron-DeepSpeed

1.2k
Stars
204
Forks
Watchers

Ongoing research training transformer language models at scale, including: BERT & GPT-2

promptsource

2.5k
Stars
337
Forks
Watchers

Toolkit for creating, sharing and using natural language prompts.

t-zero

444
Stars
51
Forks
Watchers

Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)

data-preparation

285
Stars
40
Forks
Watchers

Code used for sourcing and cleaning the BigScience ROOTS corpus

data_sourcing

31
Stars
6
Forks
Watchers

This directory gathers the tools developed by the Data Sourcing Working Group

data_tooling

74
Stars
49
Forks
Watchers

Tools for managing datasets for governance and training.

evaluation

41
Stars
24
Forks
Watchers

Code and Data for Evaluation WG