AITemplate icon indicating copy to clipboard operation
AITemplate copied to clipboard

fix resnet50 example

Open zhangjun opened this issue 3 years ago • 1 comments

description

When we run python benchmark_ait.py --batch-size=1 the first time, it will raise exceptions with message 'OSError: ./tmp/resnet50_1/test.so: cannot open shared object file: No such file or directory'.

This PR will fix the exceptions.

how to produce

python benchmark_ait.py --batch-size=1
INFO:aitemplate.testing.detect_target:Set target to CUDA
INFO:timm.models.helpers:Loading pretrained weights from url (https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-rsb-weights/resnet50_a1_0-14fe96d1.pth)
/zhangjun/mydev/gpu/AITemplate/examples/01_resnet-50/weight_utils.py:62: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  conv_w = torch.tensor(conv_w)
/zhangjun/mydev/gpu/AITemplate/examples/01_resnet-50/weight_utils.py:63: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  bn_rm = torch.tensor(bn_rm)
/zhangjun/mydev/gpu/AITemplate/examples/01_resnet-50/weight_utils.py:64: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  bn_rv = torch.tensor(bn_rv)
/zhangjun/mydev/gpu/AITemplate/examples/01_resnet-50/weight_utils.py:65: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  bn_w = torch.tensor(bn_w)
/zhangjun/mydev/gpu/AITemplate/examples/01_resnet-50/weight_utils.py:66: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  bn_b = torch.tensor(bn_b)
Traceback (most recent call last):
  File "benchmark_ait.py", line 133, in <module>
    main()
  File "/opt/python/cp38-cp38/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/python/cp38-cp38/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/python/cp38-cp38/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/python/cp38-cp38/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "benchmark_ait.py", line 129, in main
    benchmark("resnet50", batch_size, graph_mode=use_graph)
  File "benchmark_ait.py", line 76, in benchmark
    mod = Model(os.path.join("./tmp", model_name, "test.so"))
  File "/opt/python/cp38-cp38/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 212, in __init__
    self.DLL = self._DLLWrapper(lib_path, num_runtimes)
  File "/opt/python/cp38-cp38/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 169, in __init__
    self.DLL = ctypes.cdll.LoadLibrary(lib_path)
  File "/opt/python/cp38-cp38/lib/python3.8/ctypes/__init__.py", line 447, in LoadLibrary
    return self._dlltype(name)
  File "/opt/python/cp38-cp38/lib/python3.8/ctypes/__init__.py", line 369, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: ./tmp/resnet50_1/test.so: cannot open shared object file: No such file or directory
Exception ignored in: <function Model.__del__ at 0x7fb0c1ec5f70>
Traceback (most recent call last):
  File "/opt/python/cp38-cp38/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 255, in __del__
    self.close()
  File "/opt/python/cp38-cp38/lib/python3.8/site-packages/aitemplate/compiler/model.py", line 259, in close
    for ptr in list(self._allocated_ait_data):
AttributeError: 'Model' object has no attribute '_allocated_ait_data'to

zhangjun avatar Oct 11 '22 07:10 zhangjun

This change will break AMD MI250 benchmark. A better way is to re-enable hint based dynamic shape support in this version at compiling time, then at benchmark time we can use a single so file for any batch size. But before fix the hint based dynamic batch for conv, to fix the issue you hit, we need to add an arg such as compile_module=False in click, to avoid breaking MI250 benchmark script.

antinucleon avatar Oct 11 '22 07:10 antinucleon