PyTorch-StudioGAN icon indicating copy to clipboard operation
PyTorch-StudioGAN copied to clipboard

TypeError: 'tuple' object is not callable

Open goongzi-leean opened this issue 3 years ago • 5 comments

Thank you for this great work, but I seem to be having a trouble with my first run. I feel like this is a bug. My problems are as follows:

Setting up PyTorch plugin "bias_act_plugin"... Done. Setting up PyTorch plugin "upfirdn2d_plugin"... Done. Traceback (most recent call last): File "src/main.py", line 193, in hdf5_path=hdf5_path) File "drive/StudioGAN/src/loader.py", line 394, in load_worker gen_acml_loss = worker.train_generator(current_step=step) File "drive/StudioGAN/src/worker.py", line 627, in train_generator gen_acml_loss.backward() File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 396, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/usr/local/lib/python3.7/dist-packages/torch/autograd/init.py", line 175, in backward allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass File "/usr/local/lib/python3.7/dist-packages/torch/autograd/function.py", line 253, in apply return user_fn(self, *args) File "drive/StudioGAN/src/utils/style_ops/grid_sample_gradfix.py", line 52, in backward grad_input, grad_grid = _GridSample2dBackward.apply(grad_output, input, grid) File "drive/StudioGAN/src/utils/style_ops/grid_sample_gradfix.py", line 63, in forward grad_input, grad_grid = op(grad_output, input, grid, 0, 0, False, output_mask) TypeError: 'tuple' object is not callable

Looking forward to your reply.

goongzi-leean avatar Jul 31 '22 17:07 goongzi-leean

I'm sorry I forgot to post my code: !python3 src/main.py -cfg './src/configs/CIFAR10/StyleGAN2-ADA.yaml' -data '../../Dataset/' -save './outputs/cifar10_outputs/StyleGAN2-ADA/' --seed 82624 -t -hdf5 -l -metrics is fid prdc --pre_resizer lanczos --post_resizer friendly -sr -sf -sf_num 50000 -ifid --GAN_train --GAN_test

goongzi-leean avatar Jul 31 '22 23:07 goongzi-leean

I have checked that this issue arrises when PyTorch version 1.12 (the one on the latest docker image) is used and fixed it two days ago! Make sure that you are using the latest version of StudioGAN. If that still doesn't help, you might consider lowering torch version to 1.10. Thanks!

alex4727 avatar Aug 01 '22 02:08 alex4727

Dear author,

I am sorry for replying to you so late.The reason is that I have encountered a new problem and I am working on solving it.

tcmalloc: large alloc 20000006144 bytes == 0x7f548fe82000 @ 0x7f5cda34b1e7 0x7f5c69bfb0ce 0x7f5c69c51cf5 0x7f5c69c51f4f 0x7f5c69cf4673 0x5936cc 0x548c51 0x5127f1 0x549e0e 0x4bcb19 0x5134a6 0x549e0e 0x593fce 0x548ae9 0x5127f1 0x549576 0x593fce 0x5118f8 0x593dd7 0x5118f8 0x549576 0x593fce 0x548ae9 0x5127f1 0x549576 0x593fce 0x548ae9 0x5127f1 0x549576 0x593fce 0x548ae9 …… /usr/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 6 leaked semaphores to clean up at shutdown len(cache))

This caused my code to stop at the first --save_freq, which annoyed me because I'm really a beginner.

In addation, please don't be offended. Since I found img_channels =3 in your code, I would like to make a small suggestion for you. Maybe you can add a simple dataset with channel=1, which will make it easier for others to use StudioGAN.

I tried this out and found that I only needed to modify the code where the cifar10 and channel=3 appear and the config file.

Thank you!

Best,

Leean

goongzi-leean avatar Aug 02 '22 16:08 goongzi-leean

A new problem was discovered. Although I have solved it, I still want to tell you about this bug.

My code is: !python3 src/main.py -cfg './src/configs/CIFAR10/ACGAN-Mod.yaml' -data '../../Dataset/' -save './outputs/cifar10_outputs/ACGAN-Mod/' --seed 82624 --num_workers 2 -t -hdf5 -l -metrics none --pre_resizer lanczos --post_resizer friendly -sr -sf -sf_num 10000 --GAN_train --GAN_test --print_freq 5 --save_freq 10

File "src/main.py", line 193, in hdf5_path=hdf5_path) File "/drive/StudioGAN/src/loader.py", line 391, in load_worker real_cond_loss, dis_acml_loss = worker.train_discriminator(current_step=step) File "/drive/StudioGAN/src/worker.py", line 308, in train_discriminator real_cond_loss = self.cond_loss(**real_dict) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) TypeError: forward() got an unexpected keyword argument 'h'

The reason is that ResNet returns a dict with many arguments.

goongzi-leean avatar Aug 02 '22 18:08 goongzi-leean

Hi A new problem arises when I load a new Stylegan-Ada model instead of your trained model to continue training.

File "src/main.py", line 193, in hdf5_path=hdf5_path) File "/drive/StudioGAN/src/loader.py", line 394, in load_worker gen_acml_loss = worker.train_generator(current_step=step) File "/drive/StudioGAN/src/worker.py", line 627, in train_generator gen_acml_loss.backward() File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 363, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/usr/local/lib/python3.7/dist-packages/torch/autograd/init.py", line 175, in backward allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass File "/usr/local/lib/python3.7/dist-packages/torch/autograd/function.py", line 253, in apply return user_fn(self, *args) File "/drive/StudioGAN/src/utils/style_ops/grid_sample_gradfix.py", line 52, in backward grad_input, grad_grid = _GridSample2dBackward.apply(grad_output, input, grid) File " /drive /StudioGAN/src/utils/style_ops/grid_sample_gradfix.py", line 63, in forward grad_input, grad_grid = op[0](grad_output, input, grid, 0, 0, False, output_mask)

So I changed grid_sample_gradfix.py back and now it can continue training.

goongzi-leean avatar Aug 12 '22 02:08 goongzi-leean