ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: can not save model in pipeline training mode

Open fangxintao opened this issue 9 months ago • 1 comments

Is there an existing issue for this bug?

  • [x] I have searched the existing issues

The bug has not been fixed in the latest main branch

  • [ ] I have checked the latest main branch

Do you feel comfortable sharing a concise (minimal) script that reproduces the error? :)

Yes, I will share a minimal reproducible script.

🐛 Describe the bug

I use the HybridParallelPlugin methods and set pp_size to 2, tp_size to 1, it can train success, but got the issue AttributeError: 'Tensor' object has no attribute '_unpad_detach', when I try to use the methods booster.save_model() to save model. I have no ideas to solve this problem. how to deal with this?

Environment

No response

fangxintao avatar Mar 25 '25 08:03 fangxintao

Hey @fangxintao, it looks like something might be wrong with padding. Will it be possible to provide a minimal reproduction or the entire traceback?

botbw avatar Aug 20 '25 01:08 botbw