starcoder Training getting struck

Training getting struck

Open sankethgadadinni opened this issue 1 year ago • 4 comments

I'm trying to train on A100 GPU but the training is struck at. I can't see any logs other than this

UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")

Jun 09 '23 11:06 sankethgadadinni

Hi. I think it depends on the logging frequency. Try to reduce the logging_steps parameter and tell me if it solves your issue.

Jun 28 '23 09:06 ArmelRandy

same issue. It shows nothing. Have you solved your problem? thanks.

Jun 28 '23 12:06 zhuang-li

Yeah. I solved the issue. It is because of the data format. I am using the code to train llama for StarCoder. Then I fixed the error by properly preprocessing data.

Aug 22 '23 07:08 zhuang-li

Yeah. I solved the issue. It is because of the data format. I am using the code to train llama for StarCoder. Then I fixed the error by properly preprocessing data.

Could you please provide the code for the data preprocessing step as well as the correct data format?

Any details you can provide on the data handling would be greatly appreciated! Let me know if you need any other details from my end to help reproduce the problem. @zhuang-li

Aug 22 '23 07:08 tclxmeng-jia

starcoder starcoder copied to clipboard

Training getting struck

starcoder
starcoder copied to clipboard