apex icon indicating copy to clipboard operation
apex copied to clipboard

How to get FP16 weights for optimization level O2?

Open SM1991CODES opened this issue 2 years ago • 3 comments

Describe the Bug

I followed the steps and trained using O2 optimization level. I saved using torch.save(). However, size of saved weights is same as FP32 training. Ideally FP16 weights should be about 2x less in size, isn't it?

How may I get that?

Also, what is the correct way to run inference using this to see inference speed improvement?

Minimal Steps/Code to Reproduce the Bug

Expected Behavior

Environment

SM1991CODES avatar Sep 23 '22 22:09 SM1991CODES

PyTorch stores and saves the weights in default dtype which is FP32 regardless if AMP was used or not. You may try to cast the model to half:

import torch
import torch.nn as nn

device = torch.device("cuda")
model = nn.Sequential(
    nn.Linear(1000, 1000)).to(device).float()

torch.save(model, 'model_float.pt')  # 4MB
torch.save(model.half(), 'model_half.pt')  # 2MB

Aidyn-A avatar Oct 06 '22 22:10 Aidyn-A

Hello, Thanks for the reply. So are you suggesting that I first train using Pytorch amp in FP32 and finally save the model after casting all it to FP16 as you have shown? It seems a bit risky to me, intuitively. Have you tried this already? Is performance (accuracy) affected?

Please let me know.

Best Regards

SM1991CODES avatar Oct 07 '22 11:10 SM1991CODES

Yes, I agree that saving the model in FP16 is risky, so I do not recommend doing it. The accuracy will definitely be worse. I would recommend saving the model in default dtype and for the inference use native AMP:

with torch.cuda.amp.autocast():
        output = model(data)

This will improve the computational performance.

FIY, APEX AMP is not supported and maintained anymore. It will be deprecated soon.

Aidyn-A avatar Oct 07 '22 16:10 Aidyn-A