composer Random accuracy for a trained model when using load_state

** Environment **

OS: Ubuntu 4.18
Hardware (GPU, or instance type): V100

** To reproduce

Steps to reproduce the behavior:

Take a composer trained ImageNet model (I am using Resnet-50) and compute accuracy. The accuracy is 76.46% (which is close to the state-of-the-art).
Try to compute accuracy by loading the same model using the following code:

model_fn = getattr(resnet, 'resnet50')
model = model_fn(num_classes=1000, groups=1, width_per_group=64)

composer_model = ComposerClassifier(module=model)
load_model = torch.load(MODEL_PATH)
composer_model.load_state_dict(load_model['state']['model'])

The accuracy is 0.1% (which is basically random).

Expected behavior

The accuracy in both cases should be the same.

Oct 15 '22 13:10 singlasahil14

Update: I realized that is I remove model.eval() from the code, it works fine. So my guess is that batch_norm is not being handled correctly in the eval mode.

Oct 15 '22 14:10 singlasahil14

Update: Although without model.eval(), the accuracy numbers are much higher, they are still 4-5% below the original 76.46% numbers. So for some reason composer trained models do not show good results using standard pytorch APIs.

Oct 15 '22 15:10 singlasahil14

Thanks @singlasahil14 for filing this issue, and using composer! Just to update from the community slack channel, I was able to use your checkpoint and eval script, with some slight modifications, to reproduce the numbers using standard pytorch APIs:

100%|████████████████████████████████████████████████████████████████| 391/391 [00:25<00:00, 15.63it/s]
38223 50000 0.76446

It's possible the issue may be how the dataloader / dataset is being created (e.g. the model expects that input images were normalized by the imagenet statistics, and resized to 256 and then cropped to 224). If you can share you dataloader creation code as well, I can help debug!

Oct 17 '22 04:10 hanlint

Posting comment here in case others have this question:

Thanks for using Composer for your work and bringing this issue to our attention! For context,pil_image_collate produces a tensor of images with values ranging between 0-255 as opposed to the typical 0-1 range when using ToTensor. We noticed the normalization values in our example training script should have been scaled by 255 to account for the range of values from pil_image_collate. We made a PR to fix this: https://github.com/mosaicml/composer/pull/1641.

If you use the example script with the new normalization, your eval script should have much better accuracy without needing to use pil_image_collate. We are verifying results on our end as well.

Oct 18 '22 16:10 Landanjs

composer
composer copied to clipboard

Random accuracy for a trained model when using load_state_dict

Expected behavior

composer composer copied to clipboard

Random accuracy for a trained model when using load_state_dict

Expected behavior

composer
composer copied to clipboard