starcoder icon indicating copy to clipboard operation
starcoder copied to clipboard

Training getting struck

Open sankethgadadinni opened this issue 1 year ago • 4 comments

I'm trying to train on A100 GPU but the training is struck at. I can't see any logs other than this

UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")

sankethgadadinni avatar Jun 09 '23 11:06 sankethgadadinni

Hi. I think it depends on the logging frequency. Try to reduce the logging_steps parameter and tell me if it solves your issue.

ArmelRandy avatar Jun 28 '23 09:06 ArmelRandy

same issue. It shows nothing. Have you solved your problem? thanks.

zhuang-li avatar Jun 28 '23 12:06 zhuang-li

Yeah. I solved the issue. It is because of the data format. I am using the code to train llama for StarCoder. Then I fixed the error by properly preprocessing data.

zhuang-li avatar Aug 22 '23 07:08 zhuang-li

Yeah. I solved the issue. It is because of the data format. I am using the code to train llama for StarCoder. Then I fixed the error by properly preprocessing data.

Could you please provide the code for the data preprocessing step as well as the correct data format?

Any details you can provide on the data handling would be greatly appreciated! Let me know if you need any other details from my end to help reproduce the problem. @zhuang-li

tclxmeng-jia avatar Aug 22 '23 07:08 tclxmeng-jia