yolov5 icon indicating copy to clipboard operation
yolov5 copied to clipboard

YOLOv5 with PyTorch 2.0

Open SkalskiP opened this issue 2 years ago • 17 comments

Search before asking

  • [X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Did any of you tried to run YOLOv5 on PyTorch 2.0? Is it faster like they promised?

Additional

No response

SkalskiP avatar Dec 03 '22 14:12 SkalskiP

👋 Hello @SkalskiP, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email [email protected].

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

github-actions[bot] avatar Dec 03 '22 14:12 github-actions[bot]

@SkalskiP not yet. I'll try it by adding an extra line to our notebook, i.e. in Setup cell:

!git clone https://github.com/ultralytics/yolov5  # clone
%cd yolov5
%pip install -qr requirements.txt  # install
%pip install numpy --pre torch[dynamo] --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117

import torch
import utils
display = utils.notebook_init()  # checks

glenn-jocher avatar Dec 03 '22 19:12 glenn-jocher

@SkalskiP strange, I get torchvision incompatibility error when running the above code in Colab. Minimum reproducibel example:

!git clone https://github.com/ultralytics/yolov5  # clone
%cd yolov5
%pip install -qr requirements.txt  # install
%pip install numpy --pre torch[dynamo] --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117

!python detect.py

EDIT: Also strange on install shows as 1.14 instead of 2.0:

YOLOv5 🚀 v7.0-21-ga1b6e79 Python-3.8.15 torch-1.14.0.dev20221203+cu117 CUDA:0 (Tesla T4, 15110MiB)
Setup complete ✅ (2 CPUs, 12.7 GB RAM, 27.8/78.2 GB disk)

EDIT2: If I use a single pip command install works better, but inference fails:

!git clone https://github.com/ultralytics/yolov5  # clone
%cd yolov5
%pip install -r requirements.txt --pre torch[dynamo] torchvision --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117

# (restart runtime)

%cd yolov5

import torch
import utils
display = utils.notebook_init()  # checks

!python detect.py

Error:

detect: weights=yolov5s.pt, source=data/images, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5 🚀 v7.0-21-ga1b6e79 Python-3.8.15 torch-1.14.0.dev20221203+cu117 CUDA:0 (Tesla T4, 15110MiB)

Downloading https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt to yolov5s.pt...
100% 14.1M/14.1M [00:01<00:00, 10.0MB/s]

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
Traceback (most recent call last):
  File "detect.py", line 259, in <module>
    main(opt)
  File "detect.py", line 254, in main
    run(**vars(opt))
  File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 34, in decorate_context
    return func(*args, **kwargs)
  File "detect.py", line 130, in run
    pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
  File "/content/yolov5/utils/general.py", line 981, in non_max_suppression
    i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
  File "/usr/local/lib/python3.8/dist-packages/torchvision/ops/boxes.py", line 41, in nms
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
  File "/usr/local/lib/python3.8/dist-packages/torch/_ops.py", line 500, in __call__
    return self._op(*args, **kwargs or {})
NotImplementedError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torchvision::nms' is only available for these backends: [CPU, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

glenn-jocher avatar Dec 03 '22 20:12 glenn-jocher

@AyushExel @Laughing-q the most interesting part seems to be this claim of 80% speedup with AMP on torch 2.0: https://pytorch.org/get-started/pytorch-2.0/

Screenshot 2022-12-03 at 12 10 07

glenn-jocher avatar Dec 03 '22 20:12 glenn-jocher

The whole thing got me super interested, I’ll be taking a look at it today.

SkalskiP avatar Dec 03 '22 21:12 SkalskiP

@glenn-jocher there is torch.compile(model) that claims to speed up training by 30% or more without any code change.

AyushExel avatar Dec 03 '22 22:12 AyushExel

Hi, @glenn-jocher and @AyushExel 👋🏻 ! I spend ~3/4 h trying to run YOLOv5 on PyTorch 2.0 on different machines, on cpu and cuda. Failed every time.

I managed to setup the environment with latest PyTorch but didn't noticed any speedups without torch.compile(model) and when I try to do it I fail with:

File "/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py", line 690, in <lambda>
    lambda f: self._run_callback(functools.partial(callback, future))
  File "/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 787, in inner
    self.run()
  File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 748, in run
    yielded = self.gen.send(value)
  File "/usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 365, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 268, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 543, in execute_request
    self.do_execute(
  File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/usr/local/lib/python3.8/dist-packages/ipykernel/ipkernel.py", line 306, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/usr/local/lib/python3.8/dist-packages/ipykernel/zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 2854, in run_cell
    result = self._run_cell(
  File "/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 2881, in _run_cell
    return runner(coro)
  File "/usr/local/lib/python3.8/dist-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
    coro.send(None)
  File "/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3057, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3249, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-33-d0001f8fb468>", line 4, in <module>
    model(image)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1480, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/yolov5/models/yolo.py", line 209, in forward
    return self._forward_once(x, profile, visualize)  # single-scale inference, train
  File "/content/yolov5/models/yolo.py", line 121, in _forward_once
    x = m(x)  # run
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1480, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/yolov5/models/common.py", line 228, in forward
    def forward(self, x):
  File "/content/yolov5/models/common.py", line 229, in forward
    x = self.cv1(x)
  File "/content/yolov5/models/common.py", line 57, in forward
    return self.act(self.bn(self.conv(x)))

==========
[2022-12-04 01:22:58,785] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (64)
   function: 'forward_fuse' (/content/yolov5/models/common.py:59)
   reasons:  ['___check_obj_id(self, 140048282396896)']
to diagnose recompilation issues, see https://github.com/pytorch/torchdynamo/blob/main/TROUBLESHOOTING.md.

The link in description doesn't work.

Here is my colab: https://colab.research.google.com/drive/1uUTLumfMEe0x95Qqjankjy0JSwawcigL?usp=sharing

SkalskiP avatar Dec 04 '22 01:12 SkalskiP

I am more interested in this issue and keep an eye on it. Hope Mr. Glenn resolves this issue soon. @glenn-jocher

Zengyf-CVer avatar Dec 04 '22 07:12 Zengyf-CVer

@SkalskiP I came up with a much simpler reproduce example, but for some reason my install command is installing CPU version of torchvision with a CUDA version of torch. In a Colab notebook I did this, which only works correctly if I send the model to CPU first, i.e. mode.cpu():

%pip install gitpython ipython scipy seaborn --pre torch[dynamo] torchvision --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117
# (RESTART COLAB RUNTIME)

import torch
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
results = model('https://ultralytics.com/images/zidane.jpg')
results.print()  # shows speeds

My environment shows torchvision on CPU :(

!wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
!python collect_env.py


Collecting environment information...
PyTorch version: 1.14.0.dev20221204+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.25.0
Libc version: glibc-2.27

Python version: 3.8.15 (default, Oct 12 2022, 19:14:39)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.10.133+-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: 11.2.152
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 460.32.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.24.0rc2
[pip3] torch==1.14.0.dev20221204+cu117
[pip3] torchaudio==0.12.1+cu113
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.13.1
[pip3] torchtriton==2.0.0+0d7e753227
[pip3] torchvision==0.15.0.dev20221204+cpu
[conda] Could not collect

glenn-jocher avatar Dec 04 '22 21:12 glenn-jocher

mode use the function with the argument :

model = torch.compile(model, mode="reduce overhead") Or: model = torch.compile(model, mode="max-autotune")

OR:

model = torch.compile(model, mode="reduce-overhead", fullgraph=True, backend='Eager') - fastest

NeuralAIM avatar Dec 28 '22 18:12 NeuralAIM

@NeuralAIM did you manage to actually run it with YOLOv5 model?

SkalskiP avatar Dec 28 '22 19:12 SkalskiP

@NeuralAIM did you manage to actually run it with YOLOv5 model?

Yes 💯

NeuralAIM avatar Dec 30 '22 09:12 NeuralAIM

@NeuralAIM did you manage to actually run it with YOLOv5 model?

Yes 💯

Hi, where should I add the model = torch.compile(model) code? Before traininig?

2catycm avatar Jan 05 '23 19:01 2catycm

Any news on this? The possible performance gain by using PyTorch 2 would be interesting :)

zavinator avatar Mar 24 '23 15:03 zavinator

Hi @zavinator 👋🏻! The torch.compile(model) function is used to optimize and speed up training but isn't directly linked to PyTorch 2.0. Nonetheless, for performance enhancements, you can call this function before training, ensuring your model is sent to the correct device (cuda/cpu). Let me know if you need further help!

glenn-jocher avatar Nov 15 '23 16:11 glenn-jocher

I was under the impression that the compile function is exclusive to PyTorch 2.0+. But it's interesting to learn that it can be used before training – I hadn't considered that. Does this mean that training the model becomes faster? I've only tried it on the final model and didn't observe a significant speedup. Unfortunately, it seems that torch.compile doesn't work on Windows (https://github.com/pytorch/pytorch/issues/90768).

zavinator avatar Nov 15 '23 19:11 zavinator

@zavinator yes, the torch.compile(model) function can be beneficial before training, potentially speeding up the process. However, the impact on training speed can vary depending on various factors. It's worth noting that, as you mentioned, there are platform-specific limitations to be aware of - such as the issue with Windows that you referenced. Your understanding and observations are spot on!

glenn-jocher avatar Nov 16 '23 16:11 glenn-jocher