yolov5
yolov5 copied to clipboard
YOLOv5 with PyTorch 2.0
Search before asking
- [X] I have searched the YOLOv5 issues and discussions and found no similar questions.
Question
Did any of you tried to run YOLOv5 on PyTorch 2.0? Is it faster like they promised?
Additional
No response
👋 Hello @SkalskiP, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.
For business inquiries or professional support requests please visit https://ultralytics.com or email [email protected].
Requirements
Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
Environments
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
- Notebooks with free GPU:
- Google Cloud Deep Learning VM. See GCP Quickstart Guide
- Amazon Deep Learning AMI. See AWS Quickstart Guide
- Docker Image. See Docker Quickstart Guide
Status
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
@SkalskiP not yet. I'll try it by adding an extra line to our notebook, i.e. in Setup cell:
!git clone https://github.com/ultralytics/yolov5 # clone
%cd yolov5
%pip install -qr requirements.txt # install
%pip install numpy --pre torch[dynamo] --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117
import torch
import utils
display = utils.notebook_init() # checks
@SkalskiP strange, I get torchvision incompatibility error when running the above code in Colab. Minimum reproducibel example:
!git clone https://github.com/ultralytics/yolov5 # clone
%cd yolov5
%pip install -qr requirements.txt # install
%pip install numpy --pre torch[dynamo] --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117
!python detect.py
EDIT: Also strange on install shows as 1.14 instead of 2.0:
YOLOv5 🚀 v7.0-21-ga1b6e79 Python-3.8.15 torch-1.14.0.dev20221203+cu117 CUDA:0 (Tesla T4, 15110MiB)
Setup complete ✅ (2 CPUs, 12.7 GB RAM, 27.8/78.2 GB disk)
EDIT2: If I use a single pip command install works better, but inference fails:
!git clone https://github.com/ultralytics/yolov5 # clone
%cd yolov5
%pip install -r requirements.txt --pre torch[dynamo] torchvision --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117
# (restart runtime)
%cd yolov5
import torch
import utils
display = utils.notebook_init() # checks
!python detect.py
Error:
detect: weights=yolov5s.pt, source=data/images, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5 🚀 v7.0-21-ga1b6e79 Python-3.8.15 torch-1.14.0.dev20221203+cu117 CUDA:0 (Tesla T4, 15110MiB)
Downloading https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt to yolov5s.pt...
100% 14.1M/14.1M [00:01<00:00, 10.0MB/s]
Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
Traceback (most recent call last):
File "detect.py", line 259, in <module>
main(opt)
File "detect.py", line 254, in main
run(**vars(opt))
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 34, in decorate_context
return func(*args, **kwargs)
File "detect.py", line 130, in run
pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
File "/content/yolov5/utils/general.py", line 981, in non_max_suppression
i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS
File "/usr/local/lib/python3.8/dist-packages/torchvision/ops/boxes.py", line 41, in nms
return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
File "/usr/local/lib/python3.8/dist-packages/torch/_ops.py", line 500, in __call__
return self._op(*args, **kwargs or {})
NotImplementedError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torchvision::nms' is only available for these backends: [CPU, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].
@AyushExel @Laughing-q the most interesting part seems to be this claim of 80% speedup with AMP on torch 2.0: https://pytorch.org/get-started/pytorch-2.0/
The whole thing got me super interested, I’ll be taking a look at it today.
@glenn-jocher there is torch.compile(model) that claims to speed up training by 30% or more without any code change.
Hi, @glenn-jocher and @AyushExel 👋🏻 ! I spend ~3/4 h trying to run YOLOv5 on PyTorch 2.0 on different machines, on cpu and cuda. Failed every time.
I managed to setup the environment with latest PyTorch but didn't noticed any speedups without torch.compile(model) and when I try to do it I fail with:
File "/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py", line 690, in <lambda>
lambda f: self._run_callback(functools.partial(callback, future))
File "/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py", line 743, in _run_callback
ret = callback()
File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 787, in inner
self.run()
File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 748, in run
yielded = self.gen.send(value)
File "/usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 365, in process_one
yield gen.maybe_future(dispatch(*args))
File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 209, in wrapper
yielded = next(result)
File "/usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 268, in dispatch_shell
yield gen.maybe_future(handler(stream, idents, msg))
File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 209, in wrapper
yielded = next(result)
File "/usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 543, in execute_request
self.do_execute(
File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 209, in wrapper
yielded = next(result)
File "/usr/local/lib/python3.8/dist-packages/ipykernel/ipkernel.py", line 306, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/lib/python3.8/dist-packages/ipykernel/zmqshell.py", line 536, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 2854, in run_cell
result = self._run_cell(
File "/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 2881, in _run_cell
return runner(coro)
File "/usr/local/lib/python3.8/dist-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
coro.send(None)
File "/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3057, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3249, in run_ast_nodes
if (await self.run_code(code, result, async_=asy)):
File "/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-33-d0001f8fb468>", line 4, in <module>
model(image)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1480, in _call_impl
return forward_call(*args, **kwargs)
File "/content/yolov5/models/yolo.py", line 209, in forward
return self._forward_once(x, profile, visualize) # single-scale inference, train
File "/content/yolov5/models/yolo.py", line 121, in _forward_once
x = m(x) # run
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1480, in _call_impl
return forward_call(*args, **kwargs)
File "/content/yolov5/models/common.py", line 228, in forward
def forward(self, x):
File "/content/yolov5/models/common.py", line 229, in forward
x = self.cv1(x)
File "/content/yolov5/models/common.py", line 57, in forward
return self.act(self.bn(self.conv(x)))
==========
[2022-12-04 01:22:58,785] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (64)
function: 'forward_fuse' (/content/yolov5/models/common.py:59)
reasons: ['___check_obj_id(self, 140048282396896)']
to diagnose recompilation issues, see https://github.com/pytorch/torchdynamo/blob/main/TROUBLESHOOTING.md.
The link in description doesn't work.
Here is my colab: https://colab.research.google.com/drive/1uUTLumfMEe0x95Qqjankjy0JSwawcigL?usp=sharing
I am more interested in this issue and keep an eye on it. Hope Mr. Glenn resolves this issue soon. @glenn-jocher
@SkalskiP I came up with a much simpler reproduce example, but for some reason my install command is installing CPU version of torchvision with a CUDA version of torch. In a Colab notebook I did this, which only works correctly if I send the model to CPU first, i.e. mode.cpu():
%pip install gitpython ipython scipy seaborn --pre torch[dynamo] torchvision --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117
# (RESTART COLAB RUNTIME)
import torch
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
results = model('https://ultralytics.com/images/zidane.jpg')
results.print() # shows speeds
My environment shows torchvision on CPU :(
!wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
!python collect_env.py
Collecting environment information...
PyTorch version: 1.14.0.dev20221204+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.25.0
Libc version: glibc-2.27
Python version: 3.8.15 (default, Oct 12 2022, 19:14:39) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.10.133+-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: 11.2.152
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 460.32.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.24.0rc2
[pip3] torch==1.14.0.dev20221204+cu117
[pip3] torchaudio==0.12.1+cu113
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.13.1
[pip3] torchtriton==2.0.0+0d7e753227
[pip3] torchvision==0.15.0.dev20221204+cpu
[conda] Could not collect
mode use the function with the argument :
model = torch.compile(model, mode="reduce overhead")
Or:
model = torch.compile(model, mode="max-autotune")
OR:
model = torch.compile(model, mode="reduce-overhead", fullgraph=True, backend='Eager') - fastest
@NeuralAIM did you manage to actually run it with YOLOv5 model?
@NeuralAIM did you manage to actually run it with YOLOv5 model?
Yes 💯
@NeuralAIM did you manage to actually run it with YOLOv5 model?
Yes 💯
Hi, where should I add the model = torch.compile(model) code? Before traininig?
Any news on this? The possible performance gain by using PyTorch 2 would be interesting :)
Hi @zavinator 👋🏻! The torch.compile(model) function is used to optimize and speed up training but isn't directly linked to PyTorch 2.0. Nonetheless, for performance enhancements, you can call this function before training, ensuring your model is sent to the correct device (cuda/cpu). Let me know if you need further help!
I was under the impression that the compile function is exclusive to PyTorch 2.0+. But it's interesting to learn that it can be used before training – I hadn't considered that. Does this mean that training the model becomes faster? I've only tried it on the final model and didn't observe a significant speedup. Unfortunately, it seems that torch.compile doesn't work on Windows (https://github.com/pytorch/pytorch/issues/90768).
@zavinator yes, the torch.compile(model) function can be beneficial before training, potentially speeding up the process. However, the impact on training speed can vary depending on various factors. It's worth noting that, as you mentioned, there are platform-specific limitations to be aware of - such as the issue with Windows that you referenced. Your understanding and observations are spot on!