LOMO Is LOMO capable of pre-training a LLM from scratch as well?

Is LOMO capable of pre-training a LLM from scratch as well?

Open YuxingLu613 opened this issue 1 year ago • 2 comments

Jun 26 '23 07:06 YuxingLu613

Good question, we don't know how LOMO will perform in the pre-training stage. The major concern is that the SGD is sensitive to the optimization settings. My guess is that the optimization process of pre-training from scratch is more difficult compared to fine-tuning or further pre-training.

In practice, a controversial solution is using a powerful optimizer (e.g., Adam) for a warm-up and switching to a cheaper optimizer (e.g., LOMO).

Jun 26 '23 08:06 QipengGuo

I am doing further pre-training, reply later when the result comes out.

Jul 04 '23 13:07 PromptExpert

LOMO LOMO copied to clipboard

Is LOMO capable of pre-training a LLM from scratch as well?

LOMO
LOMO copied to clipboard