ReplitLM
ReplitLM copied to clipboard
How to train the ReplitLM model
When training, it is based on tramsformers' training course. It is started on the A100-80g machine, but the per gpu batch-size can be set to 2 at most, and there is extremely unbalanced memory occupation on multiple cards, such as 60G+ for the 0th card and 30G+ for other cards. In addition, is there a training parameter? Because of the current training strategy, the loss value is very large, and it almost drops slowly.
The tranining framework should be https://github.com/mosaicml/llm-foundry