pytorch-optimizer
pytorch-optimizer copied to clipboard
Adafactor fails to run on a custom (rfs) resnet12 (with MAML)
I was trying adafactor but I get the following issues:
args.scheduler=None
--------------------- META-TRAIN ------------------------
Starting training!
Traceback (most recent call last):
File "/home/miranda9/automl-meta-learning/automl-proj-src/experiments/meta_learning/main_metalearning.py", line 441, in <module>
main_resume_from_checkpoint(args)
File "/home/miranda9/automl-meta-learning/automl-proj-src/experiments/meta_learning/main_metalearning.py", line 403, in main_resume_from_checkpoint
run_training(args)
File "/home/miranda9/automl-meta-learning/automl-proj-src/experiments/meta_learning/main_metalearning.py", line 413, in run_training
meta_train_fixed_iterations(args)
File "/home/miranda9/automl-meta-learning/automl-proj-src/meta_learning/training/meta_training.py", line 233, in meta_train_fixed_iterations
args.outer_opt.step()
File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/torch/optim/optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/torch_optimizer/adafactor.py", line 191, in step
self._approx_sq_grad(
File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/torch_optimizer/adafactor.py", line 116, in _approx_sq_grad
(exp_avg_sq_row / exp_avg_sq_row.mean(dim=-1))
RuntimeError: The size of tensor a (3) must match the size of tensor b (64) at non-singleton dimension 1
with the pytorch default adam training runs so why does this one fail?
related:
- https://github.com/jettify/pytorch-optimizer/issues/404
- https://stackoverflow.com/questions/70218565/how-to-have-adafactor-run-a-custom-rfs-resnet12-with-maml-for-torch-optimize?noredirect=1&lq=1
are there any updates on this? The issue is still present
I had a look at this error which I also faced when training a ResNet-50 model. I got a similar error as @brando90, except that the dimensions of my tensors were different. Please read further in order to understand how I managed to fix this issue.
First of all, the exception is raised from here, where the tensor exp_avg_sq_row
is divided by the mean over the last dimension. In my case, exp_avg_sq_row
has size [64, 3, 7]
. When computing the mean over the last dimension, the result exp_avg_sq_row.mean(dim=-1)
will have size [64, 3]
and the dimension mismatch for this division operation raises the RuntimeError.
The solution is to unsqueeze the mean tensor such that instead of doing (exp_avg_sq_row / exp_avg_sq_row.mean(dim=-1))
, we should do (exp_avg_sq_row / exp_avg_sq_row.mean(dim=-1).unsqueeze(-1))
.
still happens, someone make a pull request?