torchmetrics
torchmetrics copied to clipboard
MeanAvaragePregicision (segm): 'tuple' object has no attribute 'cpu'
The problem rises when we instantiate the MeanAvaragePrecision class of type segm, call the update(...) method, and finally the cpu() method. I personally have no reason for calling the cpu() method, but pythorch lightning does. At the end of a training, it tries to place every inner module on the CPU (you can find the full traceback of the error at the bottom of this issue which proves what I am saying). This triggers the self._apply(...) method in the metrics.py file of this package, which rises the following error: AttributeError: 'tuple' object has no attribute 'cpu'.
The reason this is happening is that every time we update the metrics we call the following method:
def _get_safe_item_values(self, item: Dict[str, Any]) -> Union[Tensor, Tuple]:
if self.iou_type == "bbox":
boxes = _fix_empty_tensors(item["boxes"])
if boxes.numel() > 0:
boxes = box_convert(boxes, in_fmt=self.box_format, out_fmt="xyxy")
return boxes
elif self.iou_type == "segm":
masks = []
for i in item["masks"].cpu().numpy():
rle = mask_utils.encode(np.asfortranarray(i))
masks.append((tuple(rle["size"]), rle["counts"]))
return tuple(masks)
else:
raise Exception(f"IOU type {self.iou_type} is not supported")
This changes the mask type from Tensor to Tuple and then update the self.detections list in the below code:
for item in preds:
detections = self._get_safe_item_values(item)
self.detections.append(detections)
self.detection_labels.append(item["labels"])
self.detection_scores.append(item["scores"])
Finally, when the _apply(...) method is called, it tries to move every element of the self.detections list to the CPU device. But because every element is a Tuple it raises the mentioned error. In fact, a tuple does not implement the cpu() method.
current_val = getattr(this, key)
if isinstance(current_val, Tensor):
setattr(this, key, fn(current_val))
elif isinstance(current_val, Sequence):
setattr(this, key, [fn(cur_v) for cur_v in current_val])
The fn(...) method is declared here:
return self._apply(lambda t: t.cpu())
Full error traceback:
File "/home/alessandror/Projects/ml-package/examples/train.py", line 16, in main
trainer.fit(model, data, ckpt_path=ckpt_path)
File "/home/alessandror/.miniconda3/envs/ml-package/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "/home/alessandror/.miniconda3/envs/ml-package/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/alessandror/.miniconda3/envs/ml-package/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/alessandror/.miniconda3/envs/ml-package/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1202, in _run
self._post_dispatch()
File "/home/alessandror/.miniconda3/envs/ml-package/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1267, in _post_dispatch
self.accelerator.teardown()
File "/home/alessandror/.miniconda3/envs/ml-package/lib/python3.9/site-packages/pytorch_lightning/accelerators/gpu.py", line 79, in teardown
super().teardown()
File "/home/alessandror/.miniconda3/envs/ml-package/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 190, in teardown
self.training_type_plugin.teardown()
File "/home/alessandror/.miniconda3/envs/ml-package/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/single_device.py", line 86, in teardown
self.lightning_module.cpu()
File "/home/alessandror/.miniconda3/envs/ml-package/lib/python3.9/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 137, in cpu
return super().cpu()
File "/home/alessandror/.miniconda3/envs/ml-package/lib/python3.9/site-packages/torch/nn/modules/module.py", line 711, in cpu
return self._apply(lambda t: t.cpu())
File "/home/alessandror/.miniconda3/envs/ml-package/lib/python3.9/site-packages/torch/nn/modules/module.py", line 570, in _apply
module._apply(fn)
File "/home/alessandror/.miniconda3/envs/ml-package/lib/python3.9/site-packages/torchmetrics/metric.py", line 673, in _apply
setattr(this, key, [fn(cur_v) for cur_v in current_val])
File "/home/alessandror/.miniconda3/envs/ml-package/lib/python3.9/site-packages/torchmetrics/metric.py", line 673, in <listcomp>
setattr(this, key, [fn(cur_v) for cur_v in current_val])
File "/home/alessandror/.miniconda3/envs/ml-package/lib/python3.9/site-packages/torch/nn/modules/module.py", line 711, in <lambda>
return self._apply(lambda t: t.cpu())
AttributeError: 'tuple' object has no attribute 'cpu'
Hi! thanks for your contribution!, great first issue!
Hi @AleRiccardi, Thanks for raising this issue. After taking a look at it (thanks for all the info you have provided) it is a tricky issue, which stems from the limitation that torchmetrics states by default only can be tensors or list of tensors. However, for this metric with `iou_type = "segm" we actually need list of tuples of tensors (so on extra layer of nested structure).
I can think of two solutions:
- Either we refactor metric states to allow for more types and nested types of structures, which in principal is not that hard but it would still be substantial changes to the codebase.
- We can solve this issue by implementing a
TensorTuplethat implements all the common methods like.cpu,.cudaetc. I added an example of what that could look like. Then we replace the call totuple(masks)withTensorTuple(masks)and everything should somewhat work (I think something still need to be changed for ddp to work).
TensorTuple.py
from typing import Callable, Sequence, TypeVar, Optional, Union
from torch.nn import Module
import torch
from torch import Tensor, device, dtype
T = TypeVar('T', bound='TensorTuple')
class TensorTuple(tuple):
def _apply(self, fn: Callable) -> Module:
vals = [ ]
for val in self:
vals.append(fn(val))
return TensorTuple(vals)
def cuda(self: T, device: Optional[Union[int, device]] = None) -> T:
return self._apply(lambda t: t.cuda(device))
def ipu(self: T, device: Optional[Union[int, device]] = None) -> T:
return self._apply(lambda t: t.ipu(device))
def xpu(self: T, device: Optional[Union[int, device]] = None) -> T:
return self._apply(lambda t: t.xpu(device))
def cpu(self: T) -> T:
return self._apply(lambda t: t.cpu())
def type(self: T, dst_type: Union[dtype, str]) -> T:
return self._apply(lambda t: t.type(dst_type))
def float(self: T) -> T:
return self._apply(lambda t: t.float() if t.is_floating_point() else t)
def double(self: T) -> T:
return self._apply(lambda t: t.double() if t.is_floating_point() else t)
def half(self: T) -> T:
return self._apply(lambda t: t.half() if t.is_floating_point() else t)
def bfloat16(self: T) -> T:
return self._apply(lambda t: t.bfloat16() if t.is_floating_point() else t)
def to_empty(self: T, *, device: Union[str, device]) -> T:
return self._apply(lambda t: torch.empty_like(t, device=device))
@justusschock what do you think we should do?
@SkafteNicki I would not used 2. as it is easy to break something we are not aware of that way.
I suggest introducing https://github.com/Lightning-AI/utilities as a dependency and rely on https://github.com/Lightning-AI/utilities/blob/main/src/lightning_utilities/core/apply_func.py for this case (similar to what PL does in several cases). This way you could nest how deep you wish and then use apply_to_collection with dtype=torch.Tensor to map this to all levels of nested collectives.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
would be great to have a solution here?