transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Custom dataset builder for multichannel 'float32' hyperspectral images?

Open petteriTeikari opened this issue 1 year ago • 1 comments

Feature request

Is there an easy way to train e.g. ViTMAE using hyperspectral images (more than 3 "color" channels), and could (or is there already) a best practice on how to load all the images with tifffile (would return np.ndarray 3D cubes per tiff file) instead of the typical PIL?

Motivation

I wanted to test the ViTMAE [1,2] (as it allowed n > 3 channels) quickly for an existing hyperspectral datasets (that had 58 channels instead of the typical 3 for RGB, e.g. np_array.shape = (58, 48, 48) with low spatial resolution) and bumped into several painpoints.

Like:

  1. dataset = load_dataset("imagefolder", data_dir=base_dir) is supersimple, but then creates a standard PIL-based dataset, whereas I wanted to use tifffile to load my hyperspectral files that give me 3d arrays/tensors instead having to rely on "multipage hacks" with PILs

  2. Tried to create a custom "loading script" based on the Food101 script, and wrote an own Cube() class instead of the standard Image() class. That pretty much just replaced image = PIL.Image.open(path) with image = tifffile.imread(path)

and then my _generate_examples() returns this yield abs_file_path, {"image": tifffile.imread(abs_file_path).astype('uint8'), "label": label}

which results in this warning TypeError('Unsupported array dtype float64 for image encoding. Only uint8 is supported for multi-channel arrays.') as PIL prefers the uint8types whereas my data is now float32 as it comes from my a custom preprocessing script.

Summary

So could not really find a good example on how to define loaders for new types of data (or is everything going back to PIL always?)

References

i.e. have more industry standard implementation of these eventually, or similar:

[1] Ibañez et al. (2022): "Masked Auto-Encoding Spectral–Spatial Transformer for Hyperspectral Image Classification"

[2] Xu et al. (2023): "Swin MAE: Masked Autoencoders for Small Datasets"

Your contribution

I don't have a working code for training these and was wondering if there is an easy way even. Like probably need to be careful with some of the Transforms if they only support 3 color channels

petteriTeikari avatar Mar 10 '23 03:03 petteriTeikari

This question might be better suited for the forums as we keep issues for bugs and feature requests only.

sgugger avatar Mar 10 '23 13:03 sgugger

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 09 '23 15:04 github-actions[bot]