skorch Easy way to add TTA?

Hi Folks,

not an issue, but perhaps a question or an enhancement request. What would be the easiest way to add Test Time augmentation to the model? For now, I see the only way is to override NeuralNet forward iter, copying the code and adding TTA inside. Prefered way would be to do add it to predict, but predict has already the numpy version of the array so the data would move between gpu and cpu

Oct 12 '18 22:10 vykhand

This is a good question. Regarding general image augmentation, we have examples here:

https://github.com/dnouri/skorch/blob/master/notebooks/Transfer_Learning.ipynb https://github.com/dnouri/skorch/blob/master/examples/nuclei_image_segmentation/Nuclei_Image_Segmentation.ipynb

Regarding your use case, these examples may not fit exactly, though. We probably need a section in the docs that gives best practices to achieve different kinds of augmentation. Note, however, there is not always the one best way. E.g. it may depend on whether you want to do it all in the pytorch world (torchvision) or want to do it in the numpy world (opencv, skimage).

Oct 13 '18 09:10 BenjaminBossan

Here is a rough guide of how to do augmentation/preprocessing:

	train & test same	train & test different
random	Dataset.transform	2 different datasets
static	Dataset.transform, sklearn Pipeline	2 different datasets

Some comments regarding the different methods:

sklearn Pipeline

falls flat if your data does not fit in memory
applies to whole data, so no difference between train and validation
allows you to work in numpy world
allows you to work on whole data at once (sometimes better performance)
is called once per fit: faster
is called once per fit: no random augmentation possible
re-use extensive sklearn toolset (FunctionTransformer, StandardScaler, ...)
makes it super easy to swap skorch NeuralNet with another sklearn estimator

Custom dataset that overrides Dataset.transform

cannot differentiate between train and validation
can use pytorch or numpy
because of how pytorch's Dataloader works, only processes one sample at a time (sometimes slow)
called once per epoch, therefore random augmentation is possible
called once per epoch, therefore may be slower

Having separate Datasets

different processing of train and validation possible
re-use pytorch toolset (e.g. torchvision ImageFolder)
because of how pytorch's Dataloader works, only processes one sample at a time (sometimes slow)
called once per epoch, therefore random augmentation is possible
called once per epoch, therefore may be slower
cannot make use of skorch's internal train/valid split

I believe what we should add is the possibility for skorch Dataset to differentiate between train and test, which probably means we need a dataset_train and a dataset_valid (analogous to iterator_train and iterator_valid).

Notably absent is a possibility to work on batches of data once per epoch and with awareness for whether we deal with train or validation data. But this is hard to achieve because of the way Dataloader works, so I don't know whether this will ever come.

Oct 13 '18 10:10 BenjaminBossan

Hi @BenjaminBossan, thanks! However, I am talking about Test-time augmentation, which is not exactly preprocessing.

For example, null-flip TTA would do the following:

Do predict
Flip image, do predict on flipped
Flip prediction back
Average the predicts

Here is my best attempt to do this so far, all I can say it works clunky, occassionally failing with dim mismatch messages and I have not yet found a more elegant/working solution.

Overriding NeuralNet.evaluation_step:

    def evaluation_step(self, Xi, training=False):
        with torch.set_grad_enabled(training):
            self.module_.train(training)
            if self.tta_type == "nullflip" and not training:
                Xi_reflected =  torch.flip(Xi.data, (3,))
                y_reflected = torch.flip(self.infer(Xi_reflected), (3,))
                return (self.infer(Xi) + y_reflected) / 2.
            else:
                return self.infer(Xi)

Oct 13 '18 21:10 vykhand

In TTA, I would not do too much to mess with the internals of NeuralNet. I have been creating a pytorch Dataset, where I can control how my data is getting augmented. For each augmentation, I run the dataset through NeuralNet to get the prediction. While experimenting, I would save the prediction to disk to see what is going on. After I am satisfied with the number of predictions/augmentations, I load the predictions from disk and combine the predictions together.

You can also do the same thing without saving to disk, and just keeping predictions in memory.

Oct 13 '18 23:10 thomasjpfan

My advise was more on a general level, though the distinction for different kinds of preprocessing/augmentation are sometimes blurry.

For your specific example, I could see the possibility of using a pipeline where you duplicate and flip each image in advance, so that every 2nd sample is the augmented one. Then during inference:

def infer(self, x, **fit_params):
    y_infer = super().infer(x, **fit_params)
    y_infer = 0.5 * (y_infer[::2] + y_infer[1::2])
    return y_infer

This would apply the transformation to training and inference, though I don't see why not.

But as stated, it is difficult to find the solution for all kinds of preprocessing/augmentation.

Oct 14 '18 10:10 BenjaminBossan

My advise was more on a general level, though the distinction for different kinds of preprocessing/augmentation are sometimes blurry.

For your specific example, I could see the possibility of using a pipeline where you duplicate and flip each image in advance, so that every 2nd sample is the augmented one. Then during inference:
def infer(self, x, **fit_params):
    y_infer = super().infer(x, **fit_params)
    y_infer = 0.5 * (y_infer[::2] + y_infer[1::2])
    return y_infer
This would apply the transformation to training and inference, though I don't see why not.

But as stated, it is difficult to find the solution for all kinds of preprocessing/augmentation.

@BenjaminBossan one reason why one would wanna do it only on predict is because the training time would decrease dramatically. I am thinking of another way to override predict and to do it there.

Oct 14 '18 14:10 vykhand

skorch skorch copied to clipboard

Easy way to add TTA?

skorch
skorch copied to clipboard