starcoder
starcoder copied to clipboard
Training getting struck
I'm trying to train on A100 GPU but the training is struck at. I can't see any logs other than this
UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Hi. I think it depends on the logging frequency. Try to reduce the logging_steps parameter and tell me if it solves your issue.
same issue. It shows nothing. Have you solved your problem? thanks.
Yeah. I solved the issue. It is because of the data format. I am using the code to train llama for StarCoder. Then I fixed the error by properly preprocessing data.
Yeah. I solved the issue. It is because of the data format. I am using the code to train llama for StarCoder. Then I fixed the error by properly preprocessing data.
Could you please provide the code for the data preprocessing step as well as the correct data format?
Any details you can provide on the data handling would be greatly appreciated! Let me know if you need any other details from my end to help reproduce the problem. @zhuang-li