Torch-Pruning icon indicating copy to clipboard operation
Torch-Pruning copied to clipboard

yolov8-pose cuda error

Open MrJoratos opened this issue 7 months ago • 4 comments

the error is as following: albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) val: Scanning /media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/assignment10_19/data_original/4kp_data/labeled/rgb/南航.cache... 704 images, 1 Plotting labels to runs/pose/step_0_finetune11/labels.jpg... optimizer: AdamW(lr=0.000476, momentum=0.9) with parameter groups 63 weight(decay=0.0), 83 weight(decay=0.0005), 82 bias(decay=0.0) Image sizes 928 train, 928 val Using 8 dataloader workers Logging results to runs/pose/step_0_finetune11 Starting training for 10 epochs... Closing dataloader mosaic albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))

  Epoch    GPU_mem   box_loss  pose_loss  kobj_loss   cls_loss   dfl_loss  Instances       Size

0%| | 0/1329 [00:00<?, ?it/s] Traceback (most recent call last): File "torch-Pruning.py", line 403, in prune(args) File "torch-Pruning.py", line 359, in prune model.train_v2(pruning=True, **pruning_cfg) File "torch-Pruning.py", line 267, in train_v2 self.trainer.train() File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/yolo/engine/trainer.py", line 192, in train self._do_train(world_size) File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/yolo/engine/trainer.py", line 332, in _do_train self.loss, self.loss_items = self.model(batch) File "/home/hitcrt/anaconda3/envs/py381/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/nn/tasks.py", line 44, in forward return self.loss(x, args, **kwargs) File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/nn/tasks.py", line 215, in loss return self.criterion(preds, batch) File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/utils/loss.py", line 335, in call pred_bboxes = self.bbox_decode(anchor_points, pred_distri) # xyxy, (b, hw, 4) File "/media/hitcrt/6a071232-a52f-4f53-89ca-fdde738abfd8/224d2d601bc345007a991aa1b40b8bde.jpeg{824E0F58-7501-AA9A-975F-E71FEA341EF3}.pngultralytics-8.0.132/ultralytics-8.0.132/ultralytics/utils/loss.py", line 150, in bbox_decode pred_dist = pred_dist.view(b, a, 4, c // 4).softmax(3).matmul(self.proj.type(pred_dist.dtype)) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_mm)

MrJoratos avatar Dec 13 '23 15:12 MrJoratos

and the yolov8 is old version

MrJoratos avatar Dec 13 '23 15:12 MrJoratos

it's normal when running validation for the first two steps(although there is no gpu memory occupied by python), when coming to training, this error abbrupts

MrJoratos avatar Dec 13 '23 16:12 MrJoratos

@MrJoratos Hi, have you solved the problem?

J0eky avatar Dec 18 '23 09:12 J0eky

I also encountered this problem, my task is to detect, when I use the official yolov8n's dataset, I can prune and post-train normally, but when I use my own trained dataset to fetch the pruning pruning normally, when I can't post-train it, it reports the error: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!, and don't understand the logic of this! Woohoo!

Reaidu avatar Dec 30 '23 02:12 Reaidu