open_clip icon indicating copy to clipboard operation
open_clip copied to clipboard

None of the inputs have requires_grad=True. Gradients will be None

Open maxjaritz opened this issue 2 years ago • 2 comments

I am fine-tuning a model on a custom dataset. At training start, I get the warning "None of the inputs have requires_grad=True. Gradients will be None". I made this warning disappear by adding use_reentrant=False in the checkpoint function in the following three lines in transformer.py:

Interestingly, this also increased performance in the train/val loss and cross-modal retrieval, simply by setting use_reentrant=False! image

My training command is:

torchrun --nproc_per_node 8 -m training.main \
--train-data 'mydata/{00000..04089}.tar' \
--val-data 'mydata/{04090..04095}.tar' \
--train-num-samples 16115354 \
--val-num-samples 70965 \
--dataset-type webdataset \
--epochs 10 \
--batch-size 1650 \
--precision amp \
--local-loss \
--gather-with-grad \
--grad-checkpointing \
--ddp-static-graph \
--workers 8 \
--seed 0 \
--lr 0.3e-3 \
--warmup 1220 \
--report-to tensorboard \
--resume "latest" \
--zeroshot-frequency 1 \
--model ViT-B-32 \
--name ... \
--pretrained laion2B-s34B-b79K \
--lock-image \
--lock-image-unlocked-groups 9

The problem is not occurring when removing the following arguments from the training command

--lock-image \
--lock-image-unlocked-groups 9

It might be related to the following warning from the PyTorch docs (https://pytorch.org/docs/stable/checkpoint.html):

If use_reentrant=True is specified, at least one of the inputs needs to have requires_grad=True if grads are needed for model inputs, otherwise the checkpointed part of the model won’t have gradients. At least one of the outputs needs to have requires_grad=True as well. Note that this does not apply if use_reentrant=False is specified.

Do you know what the underlying issue is?

maxjaritz avatar Jul 20 '23 11:07 maxjaritz

hmm, I would have thought this works as long as you don't lock the full image or text towers... but perhaps not, it may not be good idea to checkpoint the parts of the model that have gradients disabled.

Should probably set use_reentrant=False but it's never been clear to me what the downside to that is, the PT docs mention many pluses of =False, but why was =True the default, hohumm

rwightman avatar Sep 15 '23 23:09 rwightman

In the pytorch doc, I also saw:

Note that future versions of PyTorch will default to use_reentrant=False. Default: True

maxjaritz avatar Sep 16 '23 06:09 maxjaritz