SoftGroup icon indicating copy to clipboard operation
SoftGroup copied to clipboard

Fails when training from scratch

Open cshizhe opened this issue 2 years ago • 5 comments

Hi, I tried to train the model from scratch without initializing from hais_ckpt. But it failed due to empty proposals_idx in softgroup_ops.hierarchical_aggregation. Is it possible to train from scratch?

cshizhe avatar Apr 02 '22 21:04 cshizhe

Hi. It is because the semantic and offset branches are not learned yet, leading to empty proposals. To train from scratch, you should train the semantic and offset branch first, then train the backbone. There are two ways to do that.

1) Set prepare_epochs in the config file to a value greater than 0. Let's assume num_epochs=500, prepare_epochs=100, then the semantic branch will be trained 100 epochs then the whole network will be trained in 400 epochs. (Noted that you also need to empty pretrained_module and fixed_module in the config file.

2) Set semantic_only=True then train the network. Then validate the network with best semantic checkpoint. Then load the model from best semantic checkpoint and fix the backbone before training the ensuing branches.

thangvubk avatar Apr 03 '22 02:04 thangvubk

Thank you. I tried the first approach. The best performance I got on scannetv2 is mAP/mAP50/mAP25 = 0.435/0.642/0.772. Could you reproduce the results when training from scratch on scannetv2? BTW, I used spconv 2.1.

cshizhe avatar Apr 06 '22 02:04 cshizhe

I will try to reproduce the results from scratch when available. I think the second approach would be better and more stable. If you are available could you try it and let us know.

thangvubk avatar Apr 06 '22 03:04 thangvubk

Hi @thangvubk . I have started a training from scratch on my custom dataset with method 2. Now I want to visualize in between result on backbone ntwrk (that I'm training --point-wise prediction network (backbone)). Any way to get results?

jayes97 avatar Apr 29 '22 05:04 jayes97

Hi @thangvubk . I have tried the second way, if I want to train the top-down refinement network based on the pretrian model, should I set the optimizer from the scratch. And there exists an error when load optimizer params :

raise ValueError("loaded state dict contains a parameter group " ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

RongkunYang avatar May 01 '22 08:05 RongkunYang