ocannl icon indicating copy to clipboard operation
ocannl copied to clipboard

Any lessons from Imbue for training-in-the-large?

Open lukstafi opened this issue 1 year ago • 1 comments

https://imbue.com/research/70b-infrastructure/

"In the span of a few months, with a small team of researchers and engineers, we trained a 70B parameter model from scratch on our own infrastructure that outperformed zero-shot GPT-4o on reasoning-related tasks.

Today, we’re sharing an end-to-end guide for setting up the required infrastructure: from bringing up the initial cluster and installing the OS, to automatically recovering from errors encountered during training."

lukstafi avatar Jul 04 '24 09:07 lukstafi

Also, from llm.c: https://github.com/karpathy/llm.c/discussions/677

lukstafi avatar Jul 13 '24 16:07 lukstafi