torchmetrics icon indicating copy to clipboard operation
torchmetrics copied to clipboard

Allow FID to Use Float Images

Open Queuecumber opened this issue 2 years ago • 7 comments

🚀 Feature / Motivation

The underlying FID library requires uint8 images in [0, 255] for input. I think the majority of people don't use that so it makes the code kind of ugly if you want to use the torchmetrics FID. I think there should be a parameter for converting the inputs automatically to uint8

Pitch

  1. Add a normalize parameter similar to lpips (fid.py ln 215)
  2. If normalize is true, the update function (ln 262) converts to uint8 imgs = (imgs * 255).byte()

Queuecumber avatar Sep 28 '22 13:09 Queuecumber

Hi @Queuecumber, What would be values images takes that you consider to be more normal? [0,1] or [-1,1]?

SkafteNicki avatar Sep 29 '22 08:09 SkafteNicki

I would say [0, 1] is the most common thing although I have seen [-1, 1] in the context of GANs

Queuecumber avatar Sep 29 '22 12:09 Queuecumber

Hi, I came across this issue while chasing issues about FID. The FID is a metric used only for image generation, and often used with LPIPS. So, I think there should be code consistency in using the normalize argument and [-1, 1] is the range to be more normal.

The update method should accept images in range [0, 1] if normalize else [-1,1].

kynk94 avatar Oct 11 '22 03:10 kynk94

I buy that argument

The only issue I can think of is that it would break backwards compatibility if it stopped accepting images in [0, 255]

There's nothing about generation which prescribes [-1,1] btw. Replacing your tanh with a sigmoid would get you [0,1] and should work fine.

Queuecumber avatar Oct 11 '22 03:10 Queuecumber

As @Queuecumber I would be against breaking backwards compatibility. The question then becomes if both [-1,1] and [0,1] are valid options we should still leave the conversion to the [0,255] domain up to the user?

SkafteNicki avatar Oct 11 '22 08:10 SkafteNicki

I agree that this change should support BC. As above, [0,1] could be an input if normalize is True. like LPIPS

kynk94 avatar Oct 11 '22 08:10 kynk94

Is this procedure too fancy:

  • If the image has dtype torch.byte then assume [0, 255]
  • If the image has dtype any float type and normalize is false, assume [-1, 1]
  • If the image has dtype any float type and normalize is true, assume [0, 1]

where any float dtype is {bfloat16, float16, float32, float64, anything else I forgot}

Queuecumber avatar Oct 11 '22 16:10 Queuecumber

I agree that this change should support BC.

yes, if not we would just make much more confusion then gain benefit...

Borda avatar Oct 19 '22 12:10 Borda

The more I think about it the more I think that no one in their right mind is trying to evaluate their GANs using byte images.

Also you're allowed to make breaking changes on minor version numbers if your major version number is 0

so my opinion now is that you should go for it

Queuecumber avatar Oct 20 '22 01:10 Queuecumber

Actually now that I think about it why is the implementation even using something that was trained in byte inputs? Was there any justification for it by the original implementer? It should have less dynamic range than float32

Queuecumber avatar Oct 20 '22 01:10 Queuecumber

Ah, the answer is that it immediately converts them to float images in [-1, 1]

https://github.com/toshas/torch-fidelity/blob/master/torch_fidelity/feature_extractor_inceptionv3.py#L96

That's not great, if it's going to do that why bother forcing byte inputs on people in the first place?

edit: Line 99 right after it ... looks like it's trying to match the tensorflow implementation as close as it can

Queuecumber avatar Oct 20 '22 01:10 Queuecumber