Dirk Groeneveld comments

Results 200 comments of


                                            Dirk Groeneveld

GQA into Mitchich65

@epwalsh, the fused CE loss, will it work on LUMI? It seems we have to be careful, in case that same numerical problem shows up. I guess, new approach is...

Why is OLMo not integrated into Transformers?

OLMo is now properly integrated with Transformers!

Olmo / OLMo consistency

Please file a PR! In the code, they should all be "OlmoSomething".

Olmo / OLMo consistency

👍🏻

Olmo / OLMo consistency

Ok, then we'll go with Huggingface's suggestion. That means we rename everything to `OLMo*`, right?

Fix a bug w.r.t. how local tokenizers are handled

Can you add a note to the Changelog? Then we're good to go.

OLMo 7B finetuning w/ CPU offloading does not work

Coming late to this discussion. Are you loading optimizer state from somewhere? If you are not, you should warm up your learning rate from 0 over a number of steps.

Beaker Executor opens a ton of files

I do this on a Mac. I think Linux has a higher default open file limit, so it takes a lot more to hit the same problem.

Beaker Executor opens a ton of files

I realize (now) that `step_result()` is not the correct method to call. But the behavior is quite pathological.

the loss spike

What do you mean by "find the right parameter for init"? What's the parameter you are missing?