apex
apex copied to clipboard
How to get FP16 weights for optimization level O2?
Describe the Bug
I followed the steps and trained using O2 optimization level. I saved using torch.save(). However, size of saved weights is same as FP32 training. Ideally FP16 weights should be about 2x less in size, isn't it?
How may I get that?
Also, what is the correct way to run inference using this to see inference speed improvement?
Minimal Steps/Code to Reproduce the Bug
Expected Behavior
Environment
PyTorch stores and saves the weights in default dtype which is FP32 regardless if AMP was used or not. You may try to cast the model to half:
import torch
import torch.nn as nn
device = torch.device("cuda")
model = nn.Sequential(
nn.Linear(1000, 1000)).to(device).float()
torch.save(model, 'model_float.pt') # 4MB
torch.save(model.half(), 'model_half.pt') # 2MB
Hello, Thanks for the reply. So are you suggesting that I first train using Pytorch amp in FP32 and finally save the model after casting all it to FP16 as you have shown? It seems a bit risky to me, intuitively. Have you tried this already? Is performance (accuracy) affected?
Please let me know.
Best Regards
Yes, I agree that saving the model in FP16 is risky, so I do not recommend doing it. The accuracy will definitely be worse. I would recommend saving the model in default dtype and for the inference use native AMP:
with torch.cuda.amp.autocast():
output = model(data)
This will improve the computational performance.
FIY, APEX AMP is not supported and maintained anymore. It will be deprecated soon.