composer
composer copied to clipboard
Random accuracy for a trained model when using load_state_dict
** Environment **
- OS: Ubuntu 4.18
- Hardware (GPU, or instance type): V100
** To reproduce
Steps to reproduce the behavior:
- Take a composer trained ImageNet model (I am using Resnet-50) and compute accuracy. The accuracy is 76.46% (which is close to the state-of-the-art).
- Try to compute accuracy by loading the same model using the following code:
model_fn = getattr(resnet, 'resnet50')
model = model_fn(num_classes=1000, groups=1, width_per_group=64)
composer_model = ComposerClassifier(module=model)
load_model = torch.load(MODEL_PATH)
composer_model.load_state_dict(load_model['state']['model'])
- The accuracy is 0.1% (which is basically random).
Expected behavior
The accuracy in both cases should be the same.
Update: I realized that is I remove model.eval()
from the code, it works fine.
So my guess is that batch_norm is not being handled correctly in the eval mode.
Update: Although without model.eval(), the accuracy numbers are much higher, they are still 4-5% below the original 76.46% numbers. So for some reason composer trained models do not show good results using standard pytorch APIs.
Thanks @singlasahil14 for filing this issue, and using composer! Just to update from the community slack channel, I was able to use your checkpoint and eval script, with some slight modifications, to reproduce the numbers using standard pytorch APIs:
100%|████████████████████████████████████████████████████████████████| 391/391 [00:25<00:00, 15.63it/s]
38223 50000 0.76446
It's possible the issue may be how the dataloader / dataset is being created (e.g. the model expects that input images were normalized by the imagenet statistics, and resized to 256 and then cropped to 224). If you can share you dataloader creation code as well, I can help debug!
Posting comment here in case others have this question:
Thanks for using Composer for your work and bringing this issue to our attention! For context,pil_image_collate
produces a tensor of images with values ranging between 0-255 as opposed to the typical 0-1 range when using ToTensor. We noticed the normalization values in our example training script should have been scaled by 255 to account for the range of values from pil_image_collate. We made a PR to fix this: https://github.com/mosaicml/composer/pull/1641.
If you use the example script with the new normalization, your eval script should have much better accuracy without needing to use pil_image_collate
. We are verifying results on our end as well.