Use torchvision transforms instead of PIL operations for solarization and gaussian blurring

Open siemdejong opened this issue 1 year ago • 1 comments

Problem When using a transform that uses lightly.transforms.gaussian_blur.GaussianBlur or lightly.transforms.solarization.RandomSolarization, e.g. DINOTransform, the input must be transformed to a PIL.Image first, because the Gaussian blur and the solarization use PIL.ImageFilter.GaussianBlur and PIL.ImageOps.solarize respectively. For training, the PIL.Image has to be transformed back to a tensor.

Background It might be beneficial to get rid of this conversion for performance purposes. Sometimes, using a different reader like pyvips/openslide/cucim is necessary to extract only a patch of an image because images are too big to fit in memory (e.g. in computational pathology or remote sensing). Here, often patches are extracted and immediately transformed to torch tensors for torchvision to transform the patches. Doing all the transforms on tensors overcomes converting to other formats, like PIL, increasing performance.

Alternative Torchvision provides seemingly fast (jitted) implementations for solarization [1] and gaussian blurring [2].

I'm not aware of any other transforms in the lightly package that rely on PIL.

References [1] implementation docs [2] implementation docs

Nov 07 '24 15:11 siemdejong

Thanks for raising this issue!

There is a previous discussion on supporting tensors as transform inputs #791

I think we can switch to torchvision solarization and blur implementations. When making the change we should take #1052 into account and make sure we don't re-introduce a change in the blurring.

Nov 07 '24 16:11 guarin