PyTorch-MFNet icon indicating copy to clipboard operation
PyTorch-MFNet copied to clipboard

Augmentations on GPU

Open jhagege opened this issue 6 years ago • 4 comments

Hi, great code ! I have been noticing GPU usage is a bit low (around 40%), and trying to optimize. I've been noticing that HLSTransform is very CPU intensive. Are you aware of any way to have it executed on GPU instead of CPU ? Do you think it could help ? Thanks

jhagege avatar Nov 05 '18 13:11 jhagege

I haven't found any HLS implementation on GPU. It might be helpful if the color augmentation could be done on the GPU side.

Besides considering reducing the cost of data augmentation, you can also consider reducing the cost of decoding video files. Actually, for Kinetics dataset, I found that convert the default *.mp4 using the command below can significantlty speed up the decoding stage:

For example:

ffmpeg -y -i ${SRC_VID} -c:v mpeg4 -filter:v "scale=min(iw\,(256*iw)/min(iw\,ih)):-1" -b:v 512k -an ${DST_VID}

cypw avatar Nov 09 '18 11:11 cypw

Thanks much for your feedback, this is helpful. Will give it a look.

jhagege avatar Nov 09 '18 12:11 jhagege

@cypw By the way, did you try converting videos to h264 / h265 ? Did you notice a significant improvement with mpeg4 compared to those ? Thanks !

jhagege avatar Nov 15 '18 08:11 jhagege

Hi, this comes a bit late but removing numpy functions as much as possible and using cv2 equivalents in the __call__ function in the RandomHLS augmentation saves significant cpu processing time. Essentially, substituting the np.minimum and np.maximum. Snippet below, hope it helps.

def __call__(self, data):
    assert data.ndim == 3, 'cannot operate on a single channel'
    h, w, c = data.shape
    assert c % 3 == 0, "input channel = %d, illegal" % c
    num_ims = c//3

    random_vars = tuple(int(round(self.rng.uniform(-x, x))) for x in (self.vars + [0]))
    augmented_data = np.zeros(data.shape, dtype=np.uint8)

    for i_im in range(0, num_ims): # for every image do the magic
        start, end = 3*i_im, 3*(i_im+1)
        augmented_data[:, :, start:end] = cv2.cvtColor(data[:, :, start:end], cv2.COLOR_RGB2HLS)
        augmented_data[:, :, start:end] = cv2.add(augmented_data[:, :, start:end], random_vars, dtype=cv2.CV_8UC3)
        mask = cv2.inRange(augmented_data[:, :, start], 0, 180)
        augmented_data[mask == 0, start] = 180
        augmented_data[:, :, start:end] = cv2.cvtColor(augmented_data[:, :, start:end], cv2.COLOR_HLS2RGB)

    return augmented_data

georkap avatar Dec 03 '19 15:12 georkap