BigScience Workshop
BigScience Workshop
bigscience
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
biomedical
Tools for curating biomedical training data for large-scale language modeling
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
promptsource
Toolkit for creating, sharing and using natural language prompts.
t-zero
Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)
data-preparation
Code used for sourcing and cleaning the BigScience ROOTS corpus
data_sourcing
This directory gathers the tools developed by the Data Sourcing Working Group