VL_adapter
VL_adapter copied to clipboard
A question about zero-grad settings in VL-adapter's multitask.py file.
Thanks for your brilliant work.
batch['log_train_accuracy'] = self.args.log_train_accuracy
# self.optim.zero_grad()
if self.args.fp16 and _use_native_amp:
with autocast():
if self.args.distributed:
results = self.model.module.train_step(batch)
else:
results = self.model.train_step(batch)
else:
if self.args.distributed:
results = self.model.module.train_step(batch)
else:
results = self.model.train_step(batch)
loss = results['loss']
Looking at the code, it appears that you are training without initializing the gradients before performing backpropagation.
Is there a reason why this works?