Niklas issues

Results 127 issues of


                                            Niklas

Add multiple evaluation compat

This is still too hacky to be merged 😁 cc @TevenLeScao Edit: Will make it less hacky on the other branch so merging this

xP3 Long

These are long prompts used for xP3. We found including them slightly improves performance (prompt diversity) & better preserves long generation capabilities of the model.

APPS requires mapping the dataset to the below: ```python def add_solution_apps(example): example["solution"] = random.choice(json.loads(example["solutions"])) return example ``` XLCost requires mapping the dataset to the below: ```python def clean_code_xlcost(example): clean_lines =...

.dockerignore in wrong dir

Shouldn't the .dockerignore be in the ./services/ dir, as sam only grabs the dir with the Dockerfile? Thanks for the great repo & work!

StoppingCritera for individual samples in batched input

### Feature request IIURC if I'm running batched generation and one sample in the batch has hit the stopping criteria but others have not, there is no way to be...

DeepSpeed

- Does not yet support checkpointing - `configs/olmo-small-ablation-lumi-deepspeed.yaml` is the same as `configs/olmo-small-ablation-lumi.yaml` except for `deepspeed: true` & `init_device: cpu` - `scripts/lumi/olmo-small-ablation-on-lumi-test.sh` is the same as `scripts/lumi/olmo-small-ablation-on-lumi-test-deepspeed.sh` except for `export...

Why does running Lightning on SLURM with python perform worse than with srun?

### Bug description I'm training LLMs across multiple GPUs on a single node using `Nvidia/NeMo`. When launching via `python train.py` inside of an allocation I get much worse performance than...

help wanted

question

environment: slurm

ver: 2.0.x

Niklas

WIP: MTEB

Add multiple evaluation compat

xwinostorymt

xcopa mt

xP3 Long

Codeparrot/githubpairs & co

.dockerignore in wrong dir

StoppingCritera for individual samples in batched input

DeepSpeed

Why does running Lightning on SLURM with python perform worse than with srun?