TokenFusion icon indicating copy to clipboard operation
TokenFusion copied to clipboard

No pytorch models (.pth) saved during training?

Open Rifahaziz opened this issue 1 year ago • 5 comments

Hi, first of all this is excellent work! I am trying to test some data on the segmentation portion. When I train it with pretained segformer weight, it saves data.pkl as checkpoints. It is not saving any .pth file. How am I supposed to evaluate it without any .pth file(pytorch model) generated during training? So question is, which 'path_to_pth ' path do you mean here "python main.py --gpu 0 --resume path_to_pth --evaluate" as during training no ".pth" file is generated. Am I supposed to convert the data.pkl to .pth here? Moreover, what was the total training time in a single gpu with B3 segformer weight? Thank you!

Rifahaziz avatar Feb 27 '23 15:02 Rifahaziz

Hi, you may not convert data.pkl to .pth. You can follow https://wandb.ai/wandb/common-ml-errors/reports/How-to-Save-and-Load-Models-in-PyTorch--VmlldzozMjg0MTE for saving .pth file.

The total training time in a single gpu with B3 segformer weight is about 1~2 days.

yikaiw avatar Feb 28 '23 07:02 yikaiw

which path should i reference? Or I need to retrain the model and set a path to save. 824ab829ccd493ddc8aa2b945928098 Looking foreward to ur answering! Thanks!

CE-AI avatar Mar 24 '23 05:03 CE-AI

It was the model-best.pth.tar that worked for me after retraining. Originally it was not working, maybe because I stopped the training midway. I hope this helps.

Rifahaziz avatar Mar 24 '23 06:03 Rifahaziz

i will try it later. Thanks!

CE-AI avatar Mar 24 '23 11:03 CE-AI

model-best.pth.tar can't be unzipped correctly,I may have same problem with you ,and I never stopped the training midway,it's a bug I suppose

HarrisCheNN avatar Mar 26 '23 05:03 HarrisCheNN