torchmetrics Allow FID to Use Float Images

🚀 Feature / Motivation

The underlying FID library requires uint8 images in [0, 255] for input. I think the majority of people don't use that so it makes the code kind of ugly if you want to use the torchmetrics FID. I think there should be a parameter for converting the inputs automatically to uint8

Pitch

Add a normalize parameter similar to lpips (fid.py ln 215)
If normalize is true, the update function (ln 262) converts to uint8 imgs = (imgs * 255).byte()

Sep 28 '22 13:09 Queuecumber

Hi @Queuecumber, What would be values images takes that you consider to be more normal? [0,1] or [-1,1]?

Sep 29 '22 08:09 SkafteNicki

I would say [0, 1] is the most common thing although I have seen [-1, 1] in the context of GANs

Sep 29 '22 12:09 Queuecumber

Hi, I came across this issue while chasing issues about FID. The FID is a metric used only for image generation, and often used with LPIPS. So, I think there should be code consistency in using the normalize argument and [-1, 1] is the range to be more normal.

The update method should accept images in range [0, 1] if normalize else [-1,1].

Oct 11 '22 03:10 kynk94

I buy that argument

The only issue I can think of is that it would break backwards compatibility if it stopped accepting images in [0, 255]

There's nothing about generation which prescribes [-1,1] btw. Replacing your tanh with a sigmoid would get you [0,1] and should work fine.

Oct 11 '22 03:10 Queuecumber

As @Queuecumber I would be against breaking backwards compatibility. The question then becomes if both [-1,1] and [0,1] are valid options we should still leave the conversion to the [0,255] domain up to the user?

Oct 11 '22 08:10 SkafteNicki

I agree that this change should support BC. As above, [0,1] could be an input if normalize is True. like LPIPS

Oct 11 '22 08:10 kynk94

Is this procedure too fancy:

If the image has dtype torch.byte then assume [0, 255]
If the image has dtype any float type and normalize is false, assume [-1, 1]
If the image has dtype any float type and normalize is true, assume [0, 1]

where any float dtype is {bfloat16, float16, float32, float64, anything else I forgot}

Oct 11 '22 16:10 Queuecumber

I agree that this change should support BC.

yes, if not we would just make much more confusion then gain benefit...

Oct 19 '22 12:10 Borda

The more I think about it the more I think that no one in their right mind is trying to evaluate their GANs using byte images.

Also you're allowed to make breaking changes on minor version numbers if your major version number is 0

so my opinion now is that you should go for it

Oct 20 '22 01:10 Queuecumber

Actually now that I think about it why is the implementation even using something that was trained in byte inputs? Was there any justification for it by the original implementer? It should have less dynamic range than float32

Oct 20 '22 01:10 Queuecumber

Ah, the answer is that it immediately converts them to float images in [-1, 1]

https://github.com/toshas/torch-fidelity/blob/master/torch_fidelity/feature_extractor_inceptionv3.py#L96

That's not great, if it's going to do that why bother forcing byte inputs on people in the first place?

edit: Line 99 right after it ... looks like it's trying to match the tensorflow implementation as close as it can

Oct 20 '22 01:10 Queuecumber

torchmetrics torchmetrics copied to clipboard

Allow FID to Use Float Images

🚀 Feature / Motivation

Pitch

torchmetrics
torchmetrics copied to clipboard