once-for-all icon indicating copy to clipboard operation
once-for-all copied to clipboard

What should I do after train_ofa_net

Open detectRecog opened this issue 4 years ago • 5 comments

I run train_ofa_net.py and there is three folders under 'exp/': 'kernel2kernel_depth', 'kernel_depth2kernel_depth_width', 'normal2kernel'. Then, what should I do next? There are 'checkpoint logs net.config net_info.txt run.config' under each exp subfolder after training. Anybody knows how should I deal with it?

I can not find any relations between the training exp results and 'eval_ofa_net.py'. Please help this poor kid. \doge

detectRecog avatar Jul 20 '21 03:07 detectRecog

As far as I can tell, the folders are the different stages of the progressive shrinking algorithm, kernel2kernel_depth is the training step from elastic kernel to elastic kernel and elastic depth. In the checkpoint folder you can find the trained models , model_best.pth.tar should be the final model for that step. When you want to evaluate the model you trained yourself, you have to load them in the eval_ofa_net.py script. For that you can just replace

ofa_network = ofa_net(args.net, pretrained=True)

with something to load your own network. Maybe something like this would work:

ofa_network = OFAMobileNetV3(
    ks_list=[3, 5, 7],
    expand_ratio_list=[3, 4, 6],
    depth_list=[2, 3, 4],
)       
init = torch.load('exp/kernel_depth2kernel_depth_width/phase2/checkpoint/model_best_pth.tar',map_location='cpu')['state_dict']
ofa_network.load_state_dict(init)

Bixiii avatar Jul 22 '21 16:07 Bixiii

As far as I can tell, the folders are the different stages of the progressive shrinking algorithm, kernel2kernel_depth is the training step from elastic kernel to elastic kernel and elastic depth. In the checkpoint folder you can find the trained models , model_best.pth.tar should be the final model for that step. When you want to evaluate the model you trained yourself, you have to load them in the eval_ofa_net.py script. For that you can just replace

ofa_network = ofa_net(args.net, pretrained=True)

with something to load your own network. Maybe something like this would work:

ofa_network = net = OFAMobileNetV3(
    ks_list=[3, 5, 7],
    expand_ratio_list=[3, 4, 6],
    depth_list=[2, 3, 4],
)       
init = torch.load('exp/kernel_depth2kernel_depth_width/phase2/checkpoint/model_best_pth.tar',map_location='cpu')['state_dict']
ofa_network.load_state_dict(init)

You're so kind. Thank you very much for your reply as I'm waiting for someone to save me everyday. Does this mean I should train for different stages sequentially with resuming the best checkpoint of the previous stage? Currently, I train different stages in parallel. And this is why I struggled to find the relations between checkpoints at different stages.

@Bixiii

detectRecog avatar Jul 23 '21 03:07 detectRecog

As far as I can tell, the folders are the different stages of the progressive shrinking algorithm, kernel2kernel_depth is the training step from elastic kernel to elastic kernel and elastic depth. In the checkpoint folder you can find the trained models , model_best.pth.tar should be the final model for that step. When you want to evaluate the model you trained yourself, you have to load them in the eval_ofa_net.py script. For that you can just replace

ofa_network = ofa_net(args.net, pretrained=True)

with something to load your own network. Maybe something like this would work:

ofa_network = net = OFAMobileNetV3(
    ks_list=[3, 5, 7],
    expand_ratio_list=[3, 4, 6],
    depth_list=[2, 3, 4],
)       
init = torch.load('exp/kernel_depth2kernel_depth_width/phase2/checkpoint/model_best_pth.tar',map_location='cpu')['state_dict']
ofa_network.load_state_dict(init)

Do you have any ideas for the detail of latency predictor model? how to build the network ? Thanks for your replay!

Jon-drugstore avatar Jul 27 '21 06:07 Jon-drugstore

As far as I can tell, the folders are the different stages of the progressive shrinking algorithm, kernel2kernel_depth is the training step from elastic kernel to elastic kernel and elastic depth. In the checkpoint folder you can find the trained models , model_best.pth.tar should be the final model for that step. When you want to evaluate the model you trained yourself, you have to load them in the eval_ofa_net.py script. For that you can just replace

ofa_network = ofa_net(args.net, pretrained=True)

with something to load your own network. Maybe something like this would work:

ofa_network = net = OFAMobileNetV3(
    ks_list=[3, 5, 7],
    expand_ratio_list=[3, 4, 6],
    depth_list=[2, 3, 4],
)       
init = torch.load('exp/kernel_depth2kernel_depth_width/phase2/checkpoint/model_best_pth.tar',map_location='cpu')['state_dict']
ofa_network.load_state_dict(init)

Do you have any ideas for the detail of latency predictor model? how to build the network ? Thanks for your replay!

In my understanding, once-for-all/ofa/nas/efficiency_predictor/latency_lookup_table.py describes how do they estimate the latency. For ResNet50, they just count FLOPs to represent latency

pyjhzwh avatar Sep 07 '21 15:09 pyjhzwh

As far as I can tell, the folders are the different stages of the progressive shrinking algorithm, kernel2kernel_depth is the training step from elastic kernel to elastic kernel and elastic depth. In the checkpoint folder you can find the trained models , model_best.pth.tar should be the final model for that step. When you want to evaluate the model you trained yourself, you have to load them in the eval_ofa_net.py script. For that you can just replace

ofa_network = ofa_net(args.net, pretrained=True)

with something to load your own network. Maybe something like this would work:

ofa_network = net = OFAMobileNetV3(
    ks_list=[3, 5, 7],
    expand_ratio_list=[3, 4, 6],
    depth_list=[2, 3, 4],
)       
init = torch.load('exp/kernel_depth2kernel_depth_width/phase2/checkpoint/model_best_pth.tar',map_location='cpu')['state_dict']
ofa_network.load_state_dict(init)

You're so kind. Thank you very much for your reply as I'm waiting for someone to save me everyday. Does this mean I should train for different stages sequentially with resuming the best checkpoint of the previous stage? Currently, I train different stages in parallel. And this is why I struggled to find the relations between checkpoints at different stages.

@Bixiii

I guess so. from task 'kernel' to 'depth', the depth list has more choices, from 'depth' to 'expand', the depth_list has more choices. I guess we should run task 'kernel' first, then 'depth', finally 'expand'.

pyjhzwh avatar Sep 07 '21 15:09 pyjhzwh