Torsten Scholak

Results 114 comments of Torsten Scholak

Hi @ywen666, Thanks! You can try with batch size around 32, and that should work as well.

There were some changes recently to the parser that may have resulted in a performance regression. I suspect that this is the cause the slowdown. When I have the time,...

You could help me out by telling me which input-output pairs take the longest to generate.

Thanks so much, this information will help me with the root cause analysis for the speed regression!

looks great so far! `fast-llm type=GPTTrainer` is principled (because it taps into the override logic) but ugly (because spelling out `type=` is mandatory and because it's using class names as...

> Can we please break down this PR? Otherwise it will make reviewing too difficult. Let's keep this one about the minimalistic `generate`, and move the rest to the next...

@jlamypoirier, btw, we need your guidance in determining the best way to distribute generation across ranks. Concretely, we are looking to implement this lm-eval-harness API: ``` @abc.abstractmethod def generate_until(self, requests)...

@oleksost can you help fleshing this out? not sure what the intended scope of this is. it would depend on #151, doesn't it?

Hi @bigximik, thanks for putting this together. I appreciate the careful thinking you've put in here! However, let's simplify significantly. The goal isn't to design a general, modular pipeline system....