Torch-Pruning yolov8-pose cuda error

yolov8-pose cuda error

Open MrJoratos opened this issue 7 months ago • 4 comments

the error is as following: albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) val: Scanning /media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/assignment10_19/data_original/4kp_data/labeled/rgb/南航.cache... 704 images, 1 Plotting labels to runs/pose/step_0_finetune11/labels.jpg... optimizer: AdamW(lr=0.000476, momentum=0.9) with parameter groups 63 weight(decay=0.0), 83 weight(decay=0.0005), 82 bias(decay=0.0) Image sizes 928 train, 928 val Using 8 dataloader workers Logging results to runs/pose/step_0_finetune11 Starting training for 10 epochs... Closing dataloader mosaic albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))

  Epoch    GPU_mem   box_loss  pose_loss  kobj_loss   cls_loss   dfl_loss  Instances       Size

0%| | 0/1329 [00:00<?, ?it/s] Traceback (most recent call last): File "torch-Pruning.py", line 403, in prune(args) File "torch-Pruning.py", line 359, in prune model.train_v2(pruning=True, **pruning_cfg) File "torch-Pruning.py", line 267, in train_v2 self.trainer.train() File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/yolo/engine/trainer.py", line 192, in train self._do_train(world_size) File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/yolo/engine/trainer.py", line 332, in _do_train self.loss, self.loss_items = self.model(batch) File "/home/hitcrt/anaconda3/envs/py381/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/nn/tasks.py", line 44, in forward return self.loss(x, args, **kwargs) File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/nn/tasks.py", line 215, in loss return self.criterion(preds, batch) File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/utils/loss.py", line 335, in call pred_bboxes = self.bbox_decode(anchor_points, pred_distri) # xyxy, (b, hw, 4) File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/utils/loss.py", line 150, in bbox_decode pred_dist = pred_dist.view(b, a, 4, c // 4).softmax(3).matmul(self.proj.type(pred_dist.dtype)) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_mm)

Dec 13 '23 15:12 MrJoratos

and the yolov8 is old version

Dec 13 '23 15:12 MrJoratos

it's normal when running validation for the first two steps(although there is no gpu memory occupied by python), when coming to training, this error abbrupts

Dec 13 '23 16:12 MrJoratos

@MrJoratos Hi, have you solved the problem?

Dec 18 '23 09:12 J0eky

I also encountered this problem, my task is to detect, when I use the official yolov8n's dataset, I can prune and post-train normally, but when I use my own trained dataset to fetch the pruning pruning normally, when I can't post-train it, it reports the error: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!, and don't understand the logic of this! Woohoo!

Dec 30 '23 02:12 Reaidu

Torch-Pruning Torch-Pruning copied to clipboard

yolov8-pose cuda error

Torch-Pruning
Torch-Pruning copied to clipboard