torchmetrics
torchmetrics copied to clipboard
Allow FID to Use Float Images
🚀 Feature / Motivation
The underlying FID library requires uint8 images in [0, 255] for input. I think the majority of people don't use that so it makes the code kind of ugly if you want to use the torchmetrics FID. I think there should be a parameter for converting the inputs automatically to uint8
Pitch
- Add a
normalize
parameter similar to lpips (fid.py ln 215) - If normalize is true, the update function (ln 262) converts to uint8
imgs = (imgs * 255).byte()
Hi @Queuecumber, What would be values images takes that you consider to be more normal? [0,1] or [-1,1]?
I would say [0, 1] is the most common thing although I have seen [-1, 1] in the context of GANs
Hi, I came across this issue while chasing issues about FID. The FID is a metric used only for image generation, and often used with LPIPS. So, I think there should be code consistency in using the normalize argument and [-1, 1] is the range to be more normal.
The update method should accept images in range [0, 1] if normalize else [-1,1].
I buy that argument
The only issue I can think of is that it would break backwards compatibility if it stopped accepting images in [0, 255]
There's nothing about generation which prescribes [-1,1] btw. Replacing your tanh with a sigmoid would get you [0,1] and should work fine.
As @Queuecumber I would be against breaking backwards compatibility. The question then becomes if both [-1,1] and [0,1] are valid options we should still leave the conversion to the [0,255] domain up to the user?
I agree that this change should support BC. As above, [0,1] could be an input if normalize is True. like LPIPS
Is this procedure too fancy:
- If the image has dtype torch.byte then assume [0, 255]
- If the image has dtype any float type and normalize is false, assume [-1, 1]
- If the image has dtype any float type and normalize is true, assume [0, 1]
where any float dtype is {bfloat16, float16, float32, float64, anything else I forgot}
I agree that this change should support BC.
yes, if not we would just make much more confusion then gain benefit...
The more I think about it the more I think that no one in their right mind is trying to evaluate their GANs using byte images.
Also you're allowed to make breaking changes on minor version numbers if your major version number is 0
so my opinion now is that you should go for it
Actually now that I think about it why is the implementation even using something that was trained in byte inputs? Was there any justification for it by the original implementer? It should have less dynamic range than float32
Ah, the answer is that it immediately converts them to float images in [-1, 1]
https://github.com/toshas/torch-fidelity/blob/master/torch_fidelity/feature_extractor_inceptionv3.py#L96
That's not great, if it's going to do that why bother forcing byte inputs on people in the first place?
edit: Line 99 right after it ... looks like it's trying to match the tensorflow implementation as close as it can