pytorch-image-models icon indicating copy to clipboard operation
pytorch-image-models copied to clipboard

1 Channel images not working

Open etetteh opened this issue 3 years ago • 7 comments

I am using timm version 0.5.0, and I plan to train a greyscale image, but the data loader is outputting RGB images. The model I intend to use works fine

vit_tiny = timm.create_model('vit_tiny_patch16_224', pretrained=True, num_classes=2, drop_rate=0.2, in_chans=1)
vit_tiny.patch_embed.proj
Out: Conv2d(1, 192, kernel_size=(16, 16), stride=(16, 16))

And my loader is as follows:

train_dataset = create_dataset(name="train", root=DATA_DIR, split="train")
train_loader  = create_loader(train_dataset, 
                              input_size=(1, 224, 224), 
                              batch_size=8, 
                              use_prefetcher=False, 
                              is_training=True, 
                              no_aug=True,
                              auto_augment='aa',
                             num_workers=1,
                         )

However, the output size is as follows

train_loader.dataset[0][0].shape
Out: torch.Size([3, 224, 224])

etetteh avatar Nov 17 '21 20:11 etetteh

@etetteh yes, models work fine with arbitrary input channels but the data pipeline components are defaulted to RGB as they've really only been used with color images.

The folder dataset has 'RGB' hardcoded as the PIL conversion, that needs changing to 'L', any normalization mean/std args need to be passed in as single elelement lists and tuples [0.5]. It might work if you make those changes...

rwightman avatar Nov 17 '21 21:11 rwightman

Thanks so much for the response. I guess the input size is also hardcoded, which makes it impossible to use, say, vit_tiny_patch16_224 with different input sizes, say, (512, 512).

Is this going to change in the future?

etetteh avatar Nov 17 '21 21:11 etetteh

@etetteh unlike majority of convnets, changing the resolution of the vit or mlp mixer model is essentially a different model. The sequence lengths change and with it the position embeddings change.

The vit models support interpolating the embedding to a different size, but loses quite a bit of accuracy, however not a big issue if you are fine-tuning anyways. So you can pass img_size= to the model with an extra arg (that's not in the train script since it only works on some models, to fix that someday), it will interpolate the pretrained weights for you if it can.

rwightman avatar Nov 17 '21 23:11 rwightman

@rwightman Thanks a lot. I am finetuning, so that ain't an issue as you said. I passed the 'img_size argument, and everything is fine now. I downloaded the dataset.py and dataset_factory.py scripts and changed the RGB to L. Everything is working fine.

I would suggest the channel's type, RGB OR L, to be passed as arguments in the create_dataset function, instead of the hardcoding

etetteh avatar Nov 18 '21 02:11 etetteh

@etetteh yes, I will leave this open until I have a chance to add support to dataset and also will push some better handling into the transforms fn so that it can warn / resolve a mismatch between input size and normalization inputs

rwightman avatar Nov 18 '21 23:11 rwightman

Is it means timm cannot deal with greyscale? I am using Resnet50 on the greyscale images, model shows the input_size: (3, 224, 224), and it gives me the training results. The results is wrong?

WENHUAN22 avatar Jun 02 '22 20:06 WENHUAN22

@rwightman非常感谢。我正在微调,所以参数问题这不是你说的。我通过了img_size,现在一切都很好。我下载了脚本dataset.py并将dataset_factory.py其更改RGBL。一切正常。

我建议将通道的类型RGBL作为函数中的参数传递create_dataset,而不是硬件编码

@etetteh Could you tell me how you do to change RGB to L?

xiaoyuan0203 avatar Jun 17 '22 12:06 xiaoyuan0203

@rwightman Thanks a lot. I am finetuning, so that ain't an issue as you said. I passed the 'img_size argument, and everything is fine now. I downloaded the dataset.py and dataset_factory.py scripts and changed the RGB to L. Everything is working fine.

I would suggest the channel's type, RGB OR L, to be passed as arguments in the create_dataset function, instead of the hardcoding

Could you share you modification code for grey sclae? Thanks!

Allencheng97 avatar May 11 '23 12:05 Allencheng97

@etetteh yes, models work fine with arbitrary input channels but the data pipeline components are defaulted to RGB as they've really only been used with color images.

The folder dataset has 'RGB' hardcoded as the PIL conversion, that needs changing to 'L', any normalization mean/std args need to be passed in as single elelement lists and tuples [0.5]. It might work if you make those changes...

Could you please provide the location of this hardcode? Thank you very much.

Allencheng97 avatar May 11 '23 12:05 Allencheng97

long time, but no hard coded needed anymore, passed through to folder, huggingface, webdataset, and tfds dataset readers .. only torchvision datasets aren't easily supported

rwightman avatar Jan 09 '24 23:01 rwightman