Torsten Scholak comments

Results 114 comments of


                                            Torsten Scholak

What if reducing the batch size?

Hi @ywen666, Thanks! You can try with batch size around 32, and that should work as well.

What if reducing the batch size?

There were some changes recently to the parser that may have resulted in a performance regression. I suspect that this is the cause the slowdown. When I have the time,...

What if reducing the batch size?

You could help me out by telling me which input-output pairs take the longest to generate.

What if reducing the batch size?

Thanks so much, this information will help me with the root cause analysis for the speed regression!

Generalize dynamic config classes

looks great so far! `fast-llm type=GPTTrainer` is principled (because it taps into the override logic) but ugly (because spelling out `type=` is mandatory and because it's using class names as...

Generalize dynamic config classes

yes, this is great!

Sandbox for Implementation of generate and integration of lm_eval (evaluation harness)

> Can we please break down this PR? Otherwise it will make reviewing too difficult. Let's keep this one about the minimalistic `generate`, and move the rest to the next...

Sandbox for Implementation of generate and integration of lm_eval (evaluation harness)

@jlamypoirier, btw, we need your guidance in determining the best way to distribute generation across ranks. Concretely, we are looking to implement this lm-eval-harness API: ``` @abc.abstractmethod def generate_until(self, requests)...

Online dataset mixing based on validation metrics

@oleksost can you help fleshing this out? not sure what the intended scope of this is. it would depend on #151, doesn't it?

Add data cleaning in fast-llm prepare, concept

Hi @bigximik, thanks for putting this together. I appreciate the careful thinking you've put in here! However, let's simplify significantly. The goal isn't to design a general, modular pipeline system....