Training becomes very slow with these transforms.
I am training a CNN model with Augraphy and some other transforms. When I include just 4-5 Augraphy transforms with 10-20% probability, my training becomes ~10 times slower.
When I checked the htop, I noticed that load average was shooting up very high when using these transforms.
I tried to do few things but nothing helped in speeding the training, such as reducing num_workers, etc.
Please guide me on how can I overcome this issue.
Hi, could you include the snippet of code on how you using Augraphy in your training?
Sure, following is the flow.
Importing transforms
from augraphy.augmentations import ShadowCast, ReflectedLight, Folding....
Creating a wrapper around it
class GenericTransforms:
def __init__(self, transform, prob):
self.transform = transform
self.prob = prob
def __call__(self, img):
if torch.rand(1) < self.prob:
return self.transform(img)
return img
Folding_ = GenericTransforms(Folding(), prob=0.05)
ReflectedLight_ = GenericTransforms(ReflectedLight(), prob=0.1)
ShadowCast_ = GenericTransforms(ShadowCast(), prob=0.2)
# ....
Then I compose the transforms using torchvision.transforms
def _compose_transforms_from_config(transform_config):
preprocessing_transforms = []
transform_map = {'ReflectedLight': ReflectedLight_, 'Folding': Folding_, 'ShadowCast': ShadowCast_,...}
for transform in transform_config:
trans_type = transform_map[transform['Type']]
transform_instance = trans_type(**transform['Kwargs'])
preprocessing_transforms.append(transform_instance)
preprocess_transforms = transforms.Compose(preprocessing_transforms)
return preprocess_transforms
Then these transforms are used in dataset class.
def __getitem__(self, idx):
img = self.read_image(idx)
x = self.transforms(ecg)
return x, self.targets[idx]
Note: I am using other 5-6 custom transforms in same manner. But as I include augraphy transforms, training becomes too slow with load average shooting up. But ram is under control. I am training on a 20 core machine.
Please let me know of any other information is required.
By looking at the benchmark results: https://github.com/sparkfish/augraphy/tree/dev/benchmark
ReflectedLight is one of the slowest augmentation. I think you can try to remove that and see if the speed have increased? If speed is a concern, you may consider using only those augmentations with higher value of Img/sec.
I did that, I only considered the augmentation, for which Img/sec was more than or around 1. But did not help much in speeding.
I did that, I only considered the augmentation, for which
Img/secwas more than or around 1. But did not help much in speeding.
So probably you can let me know roughly your image size? Then i can try to reproduce this with the code above from my end too.
Thanks. The image size is around 900x1100.
I have narrowed down the issue.
As I mentioned, load average shoots up with these transforms. When I limit the threads used by Numpy and Opencv to 1, then the transforms run 5-6 times faster.
os.environ["OPENBLAS_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"
cv2.setNumThreads(1)
But the thread limiting is somehow not working for the subprocesses. Hence fetching data from dataloader is still slow as it spawns multiple workers.
When I limit the threads used by Numpy and Opencv to 1, then the transforms run 5-6 times faster.
Okay, and looks like your provided code above is not complete, what would be this transform_config?
Then in x = self.transforms(ecg), preprocess_transforms is used?
In your 20 cores machine, you uses multi GPUs too?
A minor correction in the code:
This is how the transforms are defined:
class Moire_:
def __init__(self, prob=0.15):
self.moire = Moire()
self.prob = prob
def __call__(self, img):
if torch.rand(1) < self.prob:
return self.moire(img)
return img
This is how transform config look like:
- Type: ReflectedLight
Kwargs: {}
- Type: DirtyDrum
Kwargs: {}
- Type: Folding
Kwargs: {}
We use 2 GPUs to train.
I tried with colab with image size of (1100,1100,3) but i only see an increase of 30% processing time with Augraphy, and that is with probability of 1 with all 3 augmentations.
Here's the notebooks: https://drive.google.com/drive/folders/1kaUWqVY5xKhKzDJP2zyiDoyOWtgtpVQU?usp=sharing
Probably there's some overhead in the custom augmentation functions in multi gpu or multi cores. Have you try with other created augmentation function instead of Augraphy?