mmpretrain
mmpretrain copied to clipboard
why for swin-transformer inference, fp16 is slower than fp32 on Nvidia GPU
推荐使用英语模板 General question,以便你的问题帮助更多人。
首先确认以下内容
- 我已经查询了相关的 issue,但没有找到需要的帮助。
- 我已经阅读了相关文档,但仍不知道如何解决。
描述你遇到的问题
I used below configuration/model 'configs/swin_transformer/swin_base_224_b16x64_300e_imagenet.py' 'swin_base_224_b16x64_300e_imagenet_20210616_190742-93230b0d.pth' I modified image_demo.py to below
from argparse import ArgumentParser
from mmcls.apis import inference_model, init_model, show_result_pyplot
import time
from mmcv.runner.fp16_utils import wrap_fp16_model
def main(): parser = ArgumentParser() parser.add_argument('img', help='Image file') parser.add_argument('config', help='Config file') parser.add_argument('checkpoint', help='Checkpoint file') parser.add_argument( '--device', default='cuda:0', help='Device used for inference') args = parser.parse_args()
# build the model from a config file and a checkpoint file
model = init_model(args.config, args.checkpoint, device=args.device)
wrap_fp16_model(model)
warms = 10
# test a single image
for i in range(warms):
result = inference_model(model, args.img)
start = time.time()
for i in range(1000):
result = inference_model(model, args.img)
end = time.time()
print('latency is {}'.format(end-start))
# show the results
#show_result_pyplot(model, args.img, result)
if name == 'main': main()
in the mmcls/apis/inference.py, I add one line
if next(model.parameters()).is_cuda:
**data['img'] = data['img'].half()**
# scatter to specified GPU
data = scatter(data, [device])[0]
below is result running for fp16 (open-mmlab) :~/mmclassification/demo$ python image_demo.py demo_224.jpg '../configs/swin_transformer/swin_base_224_b16x64_300e_imagenet.py' '../checkpoints/swin_base_224_b16x64_300e_imagenet_20210616_190742-93230b0d.pth' --device cuda load checkpoint from local path: ../checkpoints/swin_base_224_b16x64_300e_imagenet_20210616_190742-93230b0d.pth latency is 38.851526737213135
for fp32, I didn't change any code for mmcls/apis/inference.py, for image_demo.py, just comment out wrap_fp16_model(model)
Here is the log for fp32: load checkpoint from local path: ../checkpoints/swin_base_224_b16x64_300e_imagenet_20210616_190742-93230b0d.pth latency is 35.35148549079895
there are fp32 is better than fp16. Anything wrong here, What is good way to run fp16 inference on this model?
[填写这里]
相关信息
pip list | grep "mmcv\|mmcls\|^torch"命令的输出 [填写这里]- 如果你修改了,或者使用了新的配置文件,请在这里写明
[填写这里]
-
如果你是在训练过程中遇到的问题,请填写完整的训练日志和报错信息 [填写这里]
-
如果你对
mmcls文件夹下的代码做了其他相关的修改,请在这里写明 [填写这里] in the mmcls/apis/inference.py, I add one lineif next(model.parameters()).is_cuda: data['img'] = data['img'].half() # scatter to specified GPU data = scatter(data, [device])[0]
@Ezra-Yu it is A10
Please try to set
torch.backends.cuda.matmul.allow_tf32 = False
torch.backends.cudnn.allow_tf32 = False
At the beginning of the script and test again on FP32.
@mzr1996 Thanks! Definitely I will try.
@mzr1996 I put it the beginning of the script, I don't see any performance change for FP32 Here is the location for these two lines, Could you help review whether I run fp16 above code correctly? Here is the partial code: from argparse import ArgumentParser
from mmcls.apis import inference_model, init_model, show_result_pyplot
import time import torch from mmcv.runner.fp16_utils import wrap_fp16_model
torch.backends.cuda.matmul.allow_tf32 = False torch.backends.cudnn.allow_tf32 = False
This issue will be closed as it is inactive, feel free to re-open it if necessary.