modAL
modAL copied to clipboard
Data augmentation with `skorch`
I am using modAL with skorch for integration with PyTorch. Most tutorials, such as the Pytorch models in modAL workflows tutorial from modAL, use the MNIST dataset which commonly has the same data transformation in train, pool, validation, and test sets.
From the tutorial:
mnist_data = MNIST('.', download=True, transform=ToTensor())
dataloader = DataLoader(mnist_data, shuffle=True, batch_size=60000)
X, y = next(iter(dataloader))
I would like to use data augmentation for more complex computer vision problems. The difficulty is that the augmentation should only be applied to the train set, and not to the pool set. The modAL tutorial creates a DataLoader with 1 large batch, applies data transformations, splits it into labelled (train) and unlabeled (pool) sets and feeds this to the modAL functions.
As such, observations moving from pool set to labeled set should receive additional transformations. However, I have difficulties with applying these transformations because the PyTorch transforms expect PIL images.
What would be the best way to handle this?
Thank you