keymorph icon indicating copy to clipboard operation
keymorph copied to clipboard

Evaluation with run.py fails

Open HastingsGreer opened this issue 1 year ago • 1 comments

Thank you for publishing this code! run.py fails on my system with

venv) tgreer@biag-w05:/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph$ python run.py --kp_align_method affine --num_keypoints 128 --loss_fn mse --eval                 --load_path ./weights/numkey128_aff_dice.1560.h5
{'affine_slope': -1,
 'batch_size': 1,
 'data_dir': './data/centered_IXI/',
 'dataset': 'ixi',
 'debug_mode': False,
 'dim': 3,
 'epochs': 2000,
 'eval': True,
 'gpus': '0',
 'job_name': 'keymorph',
 'kp_align_method': 'affine',
 'kp_extractor': 'conv_com',
 'kpconsistency_coeff': 0,
 'load_path': './weights/numkey128_aff_dice.1560.h5',
 'log_interval': 25,
 'loss_fn': 'mse',
 'lr': 3e-06,
 'mix_modalities': False,
 'norm_type': 'instance',
 'num_keypoints': 128,
 'num_test_subjects': 100,
 'num_workers': 1,
 'resume': False,
 'save_dir': './output/',
 'save_preds': False,
 'seed': 23,
 'steps_per_epoch': 32,
 'tps_lmbda': None,
 'transform': 'none',
 'use_amp': False,
 'use_wandb': False,
 'visualize': False,
 'wandb_api_key_path': None,
 'wandb_kwargs': {},
 'weighted_kp_align': False}
Number of GPUs: 2
Fixed train dataset has 3 modalities.
-> Modality T1 has 427 subjects (427 images, 427 masks and 0 segmentations)
-> Modality T2 has 427 subjects (427 images, 427 masks and 0 segmentations)
-> Modality PD has 427 subjects (427 images, 427 masks and 0 segmentations)
Moving train dataset has 3 modalities.
-> Modality T1 has 427 subjects (427 images, 427 masks and 0 segmentations)
-> Modality T2 has 427 subjects (427 images, 427 masks and 0 segmentations)
-> Modality PD has 427 subjects (427 images, 427 masks and 0 segmentations)
Test dataset has 3 modalities.
-> Modality T1 has 100 subjects (100 images, 100 masks and 0 segmentations)
-> Modality T2 has 100 subjects (100 images, 100 masks and 0 segmentations)
-> Modality PD has 100 subjects (100 images, 100 masks and 0 segmentations)
/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py:32: UserWarning:
    There is an imbalance between your GPUs. You may want to exclude GPU 1 which
    has less than 75% of the memory or cores of GPU 0. You can do so by setting
    the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
    environment variable.
  warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))

Model Summary
---------------------------------------------------------------
module.keypoint_extractor.module.block1.conv.weight
module.keypoint_extractor.module.block1.conv.bias
module.keypoint_extractor.module.block2.conv.weight
module.keypoint_extractor.module.block2.conv.bias
module.keypoint_extractor.module.block3.conv.weight
module.keypoint_extractor.module.block3.conv.bias
module.keypoint_extractor.module.block4.conv.weight
module.keypoint_extractor.module.block4.conv.bias
module.keypoint_extractor.module.block5.conv.weight
module.keypoint_extractor.module.block5.conv.bias
module.keypoint_extractor.module.block6.conv.weight
module.keypoint_extractor.module.block6.conv.bias
module.keypoint_extractor.module.block7.conv.weight
module.keypoint_extractor.module.block7.conv.bias
module.keypoint_extractor.module.block8.conv.weight
module.keypoint_extractor.module.block8.conv.bias
module.keypoint_extractor.module.block9.conv.weight
module.keypoint_extractor.module.block9.conv.bias
Total parameters: 8794496
---------------------------------------------------------------

Running test: subject id 0->0, mod T1->T1, aug rot0
Traceback (most recent call last):
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/run.py", line 771, in <module>
    main()
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/run.py", line 616, in main
    grid, points_f, points_m = registration_model(
                               ^^^^^^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
    output.reraise()
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/_utils.py", line 644, in reraise
    raise exception
TypeError: Caught TypeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
    output = module(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/keymorph/model.py", line 45, in forward
    points_f, points_m = self.extract_keypoints_step(img_f, img_m)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/keymorph/model.py", line 69, in extract_keypoints_step
    return self.keypoint_extractor(img1), self.keypoint_extractor(img2)
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
    output.reraise()
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/_utils.py", line 644, in reraise
    raise exception
TypeError: Caught TypeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
    output = module(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/keymorph/net.py", line 85, in forward
    out = self.block1(x)
          ^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/keymorph/layers.py", line 128, in forward
    out = self.conv(x)
          ^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 613, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/playpen-raid1/tgreer/equivariant_reg_2/keymorph_comparison/keymorph/venv/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 608, in _conv_forward
    return F.conv3d(
           ^^^^^^^^^
TypeError: conv3d() received an invalid combination of arguments - got (tuple, Parameter, Parameter, tuple, tuple, tuple, int), but expected one of:
 * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups)
      didn't match because some of the arguments have invalid types: (tuple of (Tensor,), Parameter, Parameter, tuple of (int, int, int), tuple of (int, int, int), tuple of (int, int, int), int)
 * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, str padding, tuple of ints dilation, int groups)
      didn't match because some of the arguments have invalid types: (tuple of (Tensor,), Parameter, Parameter, tuple of (int, int, int), tuple of (int, int, int), tuple of (int, int, int), int)

Do you have any advice for proceeding?

HastingsGreer avatar Feb 13 '24 16:02 HastingsGreer

270-280 with torch.set_grad_enabled(True): grid, points_f, points_m, points_a = registration_model( img_f, img_m, lmbda, True ) ## img_a = align_img(grid, img_m) img_a = align_img(grid, img_m[0]) if seg_available: seg_a = align_img(grid, seg_m)

Janmaking avatar Apr 09 '24 02:04 Janmaking