Avoid in-place copy in the `ToDevice` `Operation`

Open carmocca opened this issue 3 years ago • 1 comments

Avoids the error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [128]] is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

On an ImageNet example when the label pipeline has:

    label_pipeline = [
        ...
        ToDevice(ch.device("cuda"), non_blocking=True),
    ]
    return Loader(
        ...
        pipelines={..., "label": label_pipeline},
    )

and prefetching is done by the user (in this case Lightning).

The underlying problem might be a bug in the autograd engine because the error has a race-y nature and does not appear when training is slowed down artificially. In fact, it does not appear when anomaly detection is enabled or a sleep(0.25) call is added to the training step.

Jan 26 '22 17:01 carmocca

Hello,

I just looked at the PR and this seem to reallocate memory. FFCV tries to never allocate memory. First, it is time spent in garbage collection but more importantly it increases the amount of overall memory used (because you always need to allocate the next batch before you GC the previous). This in the end reduces the max batch size you can use on a given GPU and therefore reduces data parallelism and eventually degrades performance.

For this reason I don't think disabling in place operation is the right call here. FFCV already takes care of the pre-fecthing. Wouldn't it be easier to just disable it in PTL ? Also if PTL does pre-fetching in a similarly optimized way then maybe simply not copying the tensor to the device and let PTL do it could also work.

Thoughts ?

Jan 27 '22 17:01 GuillaumeLeclerc