Results 127 issues of Niklas

This is still too hacky to be merged 😁 cc @TevenLeScao Edit: Will make it less hacky on the other branch so merging this

xnli, xcopa should be merged first

XNLI should be merged first

These are long prompts used for xP3. We found including them slightly improves performance (prompt diversity) & better preserves long generation capabilities of the model.

APPS requires mapping the dataset to the below: ```python def add_solution_apps(example): example["solution"] = random.choice(json.loads(example["solutions"])) return example ``` XLCost requires mapping the dataset to the below: ```python def clean_code_xlcost(example): clean_lines =...

Shouldn't the .dockerignore be in the ./services/ dir, as sam only grabs the dir with the Dockerfile? Thanks for the great repo & work!

### Feature request IIURC if I'm running batched generation and one sample in the batch has hit the stopping criteria but others have not, there is no way to be...

- Does not yet support checkpointing - `configs/olmo-small-ablation-lumi-deepspeed.yaml` is the same as `configs/olmo-small-ablation-lumi.yaml` except for `deepspeed: true` & `init_device: cpu` - `scripts/lumi/olmo-small-ablation-on-lumi-test.sh` is the same as `scripts/lumi/olmo-small-ablation-on-lumi-test-deepspeed.sh` except for `export...

### Bug description I'm training LLMs across multiple GPUs on a single node using `Nvidia/NeMo`. When launching via `python train.py` inside of an allocation I get much worse performance than...

help wanted
question
environment: slurm
ver: 2.0.x