augraphy icon indicating copy to clipboard operation
augraphy copied to clipboard

Training becomes very slow with these transforms.

Open shivanference opened this issue 1 year ago • 10 comments

I am training a CNN model with Augraphy and some other transforms. When I include just 4-5 Augraphy transforms with 10-20% probability, my training becomes ~10 times slower.

When I checked the htop, I noticed that load average was shooting up very high when using these transforms.

I tried to do few things but nothing helped in speeding the training, such as reducing num_workers, etc.

Please guide me on how can I overcome this issue.

shivanference avatar Feb 29 '24 16:02 shivanference

Hi, could you include the snippet of code on how you using Augraphy in your training?

kwcckw avatar Mar 01 '24 00:03 kwcckw

Sure, following is the flow.

Importing transforms

from augraphy.augmentations import ShadowCast, ReflectedLight, Folding....

Creating a wrapper around it

class GenericTransforms:
    def __init__(self, transform, prob):
        self.transform = transform
        self.prob = prob

    def __call__(self, img):
        if torch.rand(1) < self.prob:
            return self.transform(img)
        return img
  
Folding_ = GenericTransforms(Folding(), prob=0.05)
ReflectedLight_ = GenericTransforms(ReflectedLight(), prob=0.1)
ShadowCast_ = GenericTransforms(ShadowCast(), prob=0.2)
# ....

Then I compose the transforms using torchvision.transforms

def _compose_transforms_from_config(transform_config):

    preprocessing_transforms = []
    transform_map = {'ReflectedLight': ReflectedLight_, 'Folding': Folding_, 'ShadowCast': ShadowCast_,...}
    for transform in transform_config:
        trans_type = transform_map[transform['Type']]
        transform_instance = trans_type(**transform['Kwargs'])
        preprocessing_transforms.append(transform_instance)
    preprocess_transforms = transforms.Compose(preprocessing_transforms)
    return preprocess_transforms

Then these transforms are used in dataset class.

def __getitem__(self, idx):
    img = self.read_image(idx)
    x = self.transforms(ecg)
    return x, self.targets[idx]

Note: I am using other 5-6 custom transforms in same manner. But as I include augraphy transforms, training becomes too slow with load average shooting up. But ram is under control. I am training on a 20 core machine.

Please let me know of any other information is required.

shivanference avatar Mar 01 '24 04:03 shivanference

By looking at the benchmark results: https://github.com/sparkfish/augraphy/tree/dev/benchmark

ReflectedLight is one of the slowest augmentation. I think you can try to remove that and see if the speed have increased? If speed is a concern, you may consider using only those augmentations with higher value of Img/sec.

kwcckw avatar Mar 01 '24 05:03 kwcckw

I did that, I only considered the augmentation, for which Img/sec was more than or around 1. But did not help much in speeding.

shivanference avatar Mar 01 '24 06:03 shivanference

I did that, I only considered the augmentation, for which Img/sec was more than or around 1. But did not help much in speeding.

So probably you can let me know roughly your image size? Then i can try to reproduce this with the code above from my end too.

kwcckw avatar Mar 01 '24 06:03 kwcckw

Thanks. The image size is around 900x1100.

shivanference avatar Mar 01 '24 06:03 shivanference

I have narrowed down the issue.

As I mentioned, load average shoots up with these transforms. When I limit the threads used by Numpy and Opencv to 1, then the transforms run 5-6 times faster.

os.environ["OPENBLAS_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"
cv2.setNumThreads(1)

But the thread limiting is somehow not working for the subprocesses. Hence fetching data from dataloader is still slow as it spawns multiple workers.

shivanference avatar Mar 01 '24 13:03 shivanference

When I limit the threads used by Numpy and Opencv to 1, then the transforms run 5-6 times faster.

Okay, and looks like your provided code above is not complete, what would be this transform_config?

Then in x = self.transforms(ecg), preprocess_transforms is used?

In your 20 cores machine, you uses multi GPUs too?

kwcckw avatar Mar 02 '24 01:03 kwcckw

A minor correction in the code:

This is how the transforms are defined:

class Moire_:

    def __init__(self, prob=0.15):
        self.moire = Moire()
        self.prob = prob

    def __call__(self, img):
        if torch.rand(1) < self.prob:
            return self.moire(img)
        return img

This is how transform config look like:

- Type: ReflectedLight
  Kwargs: {}
- Type: DirtyDrum
  Kwargs: {}
- Type: Folding
  Kwargs: {}

We use 2 GPUs to train.

shivanference avatar Mar 02 '24 04:03 shivanference

I tried with colab with image size of (1100,1100,3) but i only see an increase of 30% processing time with Augraphy, and that is with probability of 1 with all 3 augmentations.

Here's the notebooks: https://drive.google.com/drive/folders/1kaUWqVY5xKhKzDJP2zyiDoyOWtgtpVQU?usp=sharing

Probably there's some overhead in the custom augmentation functions in multi gpu or multi cores. Have you try with other created augmentation function instead of Augraphy?

kwcckw avatar Mar 02 '24 11:03 kwcckw