ignite icon indicating copy to clipboard operation
ignite copied to clipboard

Unify metrics output type

Open vfdev-5 opened this issue 4 years ago • 12 comments

🚀 Feature

The idea is to verify the output type for all metrics (output of compute function) and update the docs accordingly.

In general, metric's output should be a float number. In some particular cases, like Recall/Precision with average=False, the output is a torch tensor. So, let's see and decide if the output of compute() method can be :

  • def compute() -> float
  • def compute() -> Union[float, torch.Tensor] and tensor is on CPU
  • def compute() -> torch.Tensor with tensor on CPU

To address this FR, we have to make sure for each metric what kind of type it supposes to return and update the docs accordingly.

vfdev-5 avatar Feb 12 '21 12:02 vfdev-5

I think def compute() -> Union[float, torch.Tensor] makes more sense , but should this return type be universal among all metrics? or just for those like Recall/Precision ? I also think that tests.ignite.metrics should be updated too then?

ahmedo42 avatar Feb 12 '21 13:02 ahmedo42

@ahmedo42 well, the point here is to make it clear what is returned float or tensor and if tensor on which device etc. For example, if we compute a basic accuracy, it should be a float, no point to return it as like torch.tensor(0.78). Recall/Precision have an option to return precision and recall per class so the output is torch.tensor([0.88, 0.99, 0.77, ...]) which is OK.

I also think that tests.ignite.metrics should be updated too then?

yes, definitely, we can add tests of the returned type !

More thoughts about the output type which could overlap with https://github.com/pytorch/ignite/issues/1574. Probably, there will be an option for SampleWise metric usage which practically returns metric value per sample in the batch. Thus, output type will be a tensor. Anyway, this point can be addressed with that PR. cc @sdesrozis

vfdev-5 avatar Feb 12 '21 14:02 vfdev-5

@vfdev-5

Probably, there will be an option for SampleWise metric usage which practically returns metric value per sample in the batch. Thus, output type will be a tensor

so shouldn't all metrics support torch.tensor and float , then we cast to the appropriate type according to some property like self.samplewise ? just like what happens in precision and recall ?

ahmedo42 avatar Feb 12 '21 16:02 ahmedo42

Yes, it seems like that if we support various MetricUsages.

vfdev-5 avatar Feb 12 '21 16:02 vfdev-5

IMHO handle an unique type could simplify the underlying code. If no issue about performance occurs, I would prefer have torch.tensor (compatible with float) rather than union of types.

But maybe it's because I'm too lazy 😊

sdesrozis avatar Feb 12 '21 18:02 sdesrozis

@sdesrozis logging scalars as tensors can be too much :) We have to have floats which is the type on most of the cases.

vfdev-5 avatar Feb 12 '21 19:02 vfdev-5

Yep, I understand your point 😊 I talk about have an unique type, not only the typing of the output. But that is a matter of taste 😉

sdesrozis avatar Feb 12 '21 20:02 sdesrozis

Any update of this?

@vfdev-5 May I ask what's the best way to move the metric tensor back to CPU before logging? I've thought the logger will do it automatically until I got TypeError: can't convert cude:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. when using Precision with wandb logger.

failable avatar Jun 17 '21 00:06 failable

@liebkne thanks for asking and reporting the issue. Currently there is no actions decided on this issue, but I agree that we could make sure that output tensors are on CPU device.

Concerning the issue you are facing, are you using explicitly device as cuda ?

precision = Precision(average=False, device="cuda")

vfdev-5 avatar Jun 17 '21 08:06 vfdev-5

@vfdev-5 Hi, thanks for the quick reply.

I'm using Precision(average=False, device=idist.device()) in my code, and in that specific run I run my code with CUDA_VISIBLE_DEVICES=2 python main.py and using both None value for nproc_per_node and backend. So I guess it's indeed cuda.

failable avatar Jun 17 '21 08:06 failable

Yes, it is CUDA if cuda is available, I think as a workaround you can omit this argument and by default it will be cpu. I agree that we can add a logic to send cuda tensor to cpu before writing to engine.state.metrics: https://github.com/pytorch/ignite/blob/038d4787e26d6236c0b2076f9cc1153a146accbc/ignite/metrics/metric.py#L335-L347

vfdev-5 avatar Jun 17 '21 08:06 vfdev-5

I think as a workaround you can omit this argument and by default it will be cpu.

Didn't know this, thanks. Since the day I switched to idist, I turned everything related to idist.xxx. 😅

failable avatar Jun 17 '21 10:06 failable