How to train the ReplitLM model

Open ran337287 opened this issue 2 years ago • 1 comments

When training, it is based on tramsformers' training course. It is started on the A100-80g machine, but the per gpu batch-size can be set to 2 at most, and there is extremely unbalanced memory occupation on multiple cards, such as 60G+ for the 0th card and 30G+ for other cards. In addition, is there a training parameter? Because of the current training strategy, the loss value is very large, and it almost drops slowly.

Jun 09 '23 03:06 ran337287

The tranining framework should be https://github.com/mosaicml/llm-foundry

Jun 28 '23 03:06 Symbolk