Unify metrics output type
🚀 Feature
The idea is to verify the output type for all metrics (output of compute function) and update the docs accordingly.
In general, metric's output should be a float number. In some particular cases, like Recall/Precision with average=False, the output is a torch tensor. So, let's see and decide if the output of compute() method can be :
def compute() -> floatdef compute() -> Union[float, torch.Tensor]and tensor is on CPUdef compute() -> torch.Tensorwith tensor on CPU
To address this FR, we have to make sure for each metric what kind of type it supposes to return and update the docs accordingly.
I think def compute() -> Union[float, torch.Tensor] makes more sense , but should this return type be universal among all metrics?
or just for those like Recall/Precision ?
I also think that tests.ignite.metrics should be updated too then?
@ahmedo42 well, the point here is to make it clear what is returned float or tensor and if tensor on which device etc.
For example, if we compute a basic accuracy, it should be a float, no point to return it as like torch.tensor(0.78).
Recall/Precision have an option to return precision and recall per class so the output is torch.tensor([0.88, 0.99, 0.77, ...]) which is OK.
I also think that tests.ignite.metrics should be updated too then?
yes, definitely, we can add tests of the returned type !
More thoughts about the output type which could overlap with https://github.com/pytorch/ignite/issues/1574. Probably, there will be an option for SampleWise metric usage which practically returns metric value per sample in the batch. Thus, output type will be a tensor. Anyway, this point can be addressed with that PR. cc @sdesrozis
@vfdev-5
Probably, there will be an option for SampleWise metric usage which practically returns metric value per sample in the batch. Thus, output type will be a tensor
so shouldn't all metrics support torch.tensor and float , then we cast to the appropriate type according to some
property like self.samplewise ? just like what happens in precision and recall ?
Yes, it seems like that if we support various MetricUsages.
IMHO handle an unique type could simplify the underlying code. If no issue about performance occurs, I would prefer have torch.tensor (compatible with float) rather than union of types.
But maybe it's because I'm too lazy 😊
@sdesrozis logging scalars as tensors can be too much :) We have to have floats which is the type on most of the cases.
Yep, I understand your point 😊 I talk about have an unique type, not only the typing of the output. But that is a matter of taste 😉
Any update of this?
@vfdev-5 May I ask what's the best way to move the metric tensor back to CPU before logging? I've thought the logger will do it automatically until I got TypeError: can't convert cude:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. when using Precision with wandb logger.
@liebkne thanks for asking and reporting the issue. Currently there is no actions decided on this issue, but I agree that we could make sure that output tensors are on CPU device.
Concerning the issue you are facing, are you using explicitly device as cuda ?
precision = Precision(average=False, device="cuda")
@vfdev-5 Hi, thanks for the quick reply.
I'm using Precision(average=False, device=idist.device()) in my code, and in that specific run I run my code with CUDA_VISIBLE_DEVICES=2 python main.py and using both None value for nproc_per_node and backend. So I guess it's indeed cuda.
Yes, it is CUDA if cuda is available, I think as a workaround you can omit this argument and by default it will be cpu.
I agree that we can add a logic to send cuda tensor to cpu before writing to engine.state.metrics:
https://github.com/pytorch/ignite/blob/038d4787e26d6236c0b2076f9cc1153a146accbc/ignite/metrics/metric.py#L335-L347
I think as a workaround you can omit this argument and by default it will be cpu.
Didn't know this, thanks. Since the day I switched to idist, I turned everything related to idist.xxx. 😅