pytorch-image-models
pytorch-image-models copied to clipboard
1 Channel images not working
I am using timm
version 0.5.0
, and I plan to train a greyscale image, but the data loader is outputting RGB images.
The model I intend to use works fine
vit_tiny = timm.create_model('vit_tiny_patch16_224', pretrained=True, num_classes=2, drop_rate=0.2, in_chans=1)
vit_tiny.patch_embed.proj
Out: Conv2d(1, 192, kernel_size=(16, 16), stride=(16, 16))
And my loader is as follows:
train_dataset = create_dataset(name="train", root=DATA_DIR, split="train")
train_loader = create_loader(train_dataset,
input_size=(1, 224, 224),
batch_size=8,
use_prefetcher=False,
is_training=True,
no_aug=True,
auto_augment='aa',
num_workers=1,
)
However, the output size is as follows
train_loader.dataset[0][0].shape
Out: torch.Size([3, 224, 224])
@etetteh yes, models work fine with arbitrary input channels but the data pipeline components are defaulted to RGB as they've really only been used with color images.
The folder dataset has 'RGB' hardcoded as the PIL conversion, that needs changing to 'L', any normalization mean/std args need to be passed in as single elelement lists and tuples [0.5]. It might work if you make those changes...
Thanks so much for the response. I guess the input size is also hardcoded, which makes it impossible to use, say, vit_tiny_patch16_224
with different input sizes, say, (512, 512)
.
Is this going to change in the future?
@etetteh unlike majority of convnets, changing the resolution of the vit or mlp mixer model is essentially a different model. The sequence lengths change and with it the position embeddings change.
The vit models support interpolating the embedding to a different size, but loses quite a bit of accuracy, however not a big issue if you are fine-tuning anyways. So you can pass img_size=
to the model with an extra arg (that's not in the train script since it only works on some models, to fix that someday), it will interpolate the pretrained weights for you if it can.
@rwightman Thanks a lot. I am finetuning, so that ain't an issue as you said. I passed the 'img_size
argument, and everything is fine now. I downloaded the dataset.py
and dataset_factory.py
scripts and changed the RGB
to L
. Everything is working fine.
I would suggest the channel's type, RGB
OR L
, to be passed as arguments in the create_dataset
function, instead of the hardcoding
@etetteh yes, I will leave this open until I have a chance to add support to dataset and also will push some better handling into the transforms fn so that it can warn / resolve a mismatch between input size and normalization inputs
Is it means timm cannot deal with greyscale? I am using Resnet50 on the greyscale images, model shows the input_size: (3, 224, 224), and it gives me the training results. The results is wrong?
@rwightman非常感谢。我正在微调,所以参数问题这不是你说的。我通过了
img_size
,现在一切都很好。我下载了脚本dataset.py
并将dataset_factory.py
其更改RGB
为L
。一切正常。我建议将通道的类型
RGB
或L
作为函数中的参数传递create_dataset
,而不是硬件编码
@etetteh Could you tell me how you do to change RGB
to L
?
@rwightman Thanks a lot. I am finetuning, so that ain't an issue as you said. I passed the '
img_size
argument, and everything is fine now. I downloaded thedataset.py
anddataset_factory.py
scripts and changed theRGB
toL
. Everything is working fine.I would suggest the channel's type,
RGB
ORL
, to be passed as arguments in thecreate_dataset
function, instead of the hardcoding
Could you share you modification code for grey sclae? Thanks!
@etetteh yes, models work fine with arbitrary input channels but the data pipeline components are defaulted to RGB as they've really only been used with color images.
The folder dataset has 'RGB' hardcoded as the PIL conversion, that needs changing to 'L', any normalization mean/std args need to be passed in as single elelement lists and tuples [0.5]. It might work if you make those changes...
Could you please provide the location of this hardcode? Thank you very much.
long time, but no hard coded needed anymore, passed through to folder, huggingface, webdataset, and tfds dataset readers .. only torchvision datasets aren't easily supported