flowmap icon indicating copy to clipboard operation
flowmap copied to clipboard

The bug when running the Code

Open kk6398 opened this issue 1 year ago • 9 comments

Hi, thank for your share. I meet the bug when I run the code: "python3 -m flowmap.overfit dataset=images dataset.images.root="/xxx/flowmap/mipnerf360_garden"" Do you have any method to solve it? image image

kk6398 avatar Apr 28 '24 01:04 kk6398

Hi, thank for your share. I meet the bug when I run the code: "python3 -m flowmap.overfit dataset=images dataset.images.root="/xxx/flowmap/mipnerf360_garden"" Do you have any method to solve it? image image

image

kk6398 avatar Apr 28 '24 05:04 kk6398

I got the same error-message as above, after that I commentd the "SLURMEnvironment" in overfit.py, then it seems stated work but got GPU out-of-memory error-message:

...
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/functional.py", line 2553, in instance_norm
    return torch.instance_norm(
           ^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 560.00 MiB. GPU 

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Computing RAFT flow:   0%|                                                                                                                                                                                                                                                                                           | 0/6 [00:00<?, ?it/s]
[rank: 1] Child process with PID 17642 terminated with code 1. Forcefully terminating all other processes to avoid zombies 🧟
Killed
...

minounou avatar Apr 28 '24 23:04 minounou

I got the same error-message as above, after that I commentd the "SLURMEnvironment" in overfit.py, then it seems stated work but got GPU out-of-memory error-message:

...
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/functional.py", line 2553, in instance_norm
    return torch.instance_norm(
           ^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 560.00 MiB. GPU 

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Computing RAFT flow:   0%|                                                                                                                                                                                                                                                                                           | 0/6 [00:00<?, ?it/s]
[rank: 1] Child process with PID 17642 terminated with code 1. Forcefully terminating all other processes to avoid zombies 🧟
Killed
...

same goes here

Allen-Zhou729 avatar Apr 29 '24 01:04 Allen-Zhou729

I got the same error-message as above, after that I commentd the "SLURMEnvironment" in overfit.py, then it seems stated work but got GPU out-of-memory error-message:

...
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/functional.py", line 2553, in instance_norm
    return torch.instance_norm(
           ^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 560.00 MiB. GPU 

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Computing RAFT flow:   0%|                                                                                                                                                                                                                                                                                           | 0/6 [00:00<?, ?it/s]
[rank: 1] Child process with PID 17642 terminated with code 1. Forcefully terminating all other processes to avoid zombies 🧟
Killed
...

same goes here

same OOM issue, but force using a single GPU worked for me when running overfit.py

GaoHchen avatar Apr 29 '24 09:04 GaoHchen

I got the same error-message as above, after that I commentd the "SLURMEnvironment" in overfit.py, then it seems stated work but got GPU out-of-memory error-message:

...
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/functional.py", line 2553, in instance_norm
    return torch.instance_norm(
           ^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 560.00 MiB. GPU 

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Computing RAFT flow:   0%|                                                                                                                                                                                                                                                                                           | 0/6 [00:00<?, ?it/s]
[rank: 1] Child process with PID 17642 terminated with code 1. Forcefully terminating all other processes to avoid zombies 🧟
Killed
...

same goes here

same OOM issue, but force using a single GPU worked for me when running overfit.py

How to "force using a single GPU"? What the specifical code?

kk6398 avatar Apr 29 '24 13:04 kk6398

I got the same error-message as above, after that I commentd the "SLURMEnvironment" in overfit.py, then it seems stated work but got GPU out-of-memory error-message:

...
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/functional.py", line 2553, in instance_norm
    return torch.instance_norm(
           ^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 560.00 MiB. GPU 

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Computing RAFT flow:   0%|                                                                                                                                                                                                                                                                                           | 0/6 [00:00<?, ?it/s]
[rank: 1] Child process with PID 17642 terminated with code 1. Forcefully terminating all other processes to avoid zombies 🧟
Killed
...

same goes here

same OOM issue, but force using a single GPU worked for me when running overfit.py

How to "force using a single GPU"? What the specifical code?

comment the "SLURMEnvironment", and run with CUDA_VISIBLE_DEVICES=0 python -m xxxx

GaoHchen avatar Apr 29 '24 14:04 GaoHchen

OK got it I run the following command and still got cuda-memory-error, could you take a look Thanks!

command: CUDA_VISIBLE_DEVICES=0 python -m flowmap.overfit dataset=images dataset.images.root=./datasets/flowmap/co3d_bench/

error-message:

CUDA_VISIBLE_DEVICES=0 python -m flowmap.overfit dataset=images dataset.images.root=./datasets/flowmap/co3d_bench/
rm: cannot remove 'outputs/local': No such file or directory
Precomputing optical flow.
Computing RAFT flow: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:11<00:00,  1.72it/s]
Computing RAFT flow: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:10<00:00,  1.78it/s]
Using cache found in /home/bizon/.cache/torch/hub/facebookresearch_co-tracker_v1.0
Computing tracks: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [01:23<00:00,  2.77s/it]
Using cache found in /home/bizon/.cache/torch/hub/intel-isl_MiDaS_master
Loading weights:  None
Using cache found in /home/bizon/.cache/torch/hub/rwightman_gen-efficientnet-pytorch_master
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA GeForce RTX 4090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type  | Params
--------------------------------
0 | model | Model | 21.3 M
--------------------------------
21.3 M    Trainable params
0         Non-trainable params
21.3 M    Total params
85.382    Total estimated model params size (MB)
Sanity Checking: |                                                                                                                                                                                                                                                                                              | 0/? [00:00<?, ?it/s]/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance.
/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance. 
Epoch 0: |                                                                                                                                                                                                                                                                                                      | 0/? [00:00<?, ?it/s]Error executing job with overrides: ['dataset=images', 'dataset.images.root=./datasets/flowmap/co3d_bench/']
Traceback (most recent call last):
  File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/overfit.py", line 112, in overfit
    trainer.fit(
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 987, in _run
    results = self._run_stage()
              ^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 1033, in _run_stage
    self.fit_loop.run()
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py", line 205, in run
    self.advance()
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py", line 363, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 140, in run
    self.advance(data_fetcher)
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 250, in advance
    batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 190, in run
    self._optimizer_step(batch_idx, closure)
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 268, in _optimizer_step
    call._call_lightning_module_hook(
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/core/module.py", line 1303, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/core/optimizer.py", line 152, in step
    step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/strategies/strategy.py", line 239, in optimizer_step
    return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/plugins/precision/precision.py", line 122, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/optim/optimizer.py", line 391, in wrapper
    out = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
    ret = func(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/optim/adam.py", line 148, in step
    loss = closure()
           ^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/plugins/precision/precision.py", line 108, in _wrap_closure
    closure_result = closure()
                     ^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 144, in __call__
    self._result = self.closure(*args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 129, in closure
    step_output = self._step_fn()
                  ^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 318, in _training_step
    training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook
    output = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/strategies/strategy.py", line 391, in training_step
    return self.lightning_module.training_step(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/model/model_wrapper_overfit.py", line 53, in training_step
    model_output = self.model(self.batch, self.flows, self.global_step)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/jaxtyping/_decorator.py", line 450, in wrapped_fn
    out = fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/model/model.py", line 64, in forward
    backbone_out = self.backbone.forward(batch, flows)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/jaxtyping/_decorator.py", line 450, in wrapped_fn
    out = fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/model/backbone/backbone_midas.py", line 94, in forward
    backward_weights = self.compute_correspondence_weights(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/jaxtyping/_decorator.py", line 450, in wrapped_fn
    out = fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/model/backbone/backbone_midas.py", line 111, in compute_correspondence_weights
    weights = self.corr_weighter_perpoint(features).sigmoid().clip(min=1e-4)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
            ^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.33 GiB. GPU 

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Epoch 0: |                                                                                                                                       

minounou avatar Apr 29 '24 18:04 minounou

OK got it I run the following command and still got cuda-memory-error, could you take a look Thanks!

command: CUDA_VISIBLE_DEVICES=0 python -m flowmap.overfit dataset=images dataset.images.root=./datasets/flowmap/co3d_bench/

error-message:

CUDA_VISIBLE_DEVICES=0 python -m flowmap.overfit dataset=images dataset.images.root=./datasets/flowmap/co3d_bench/
rm: cannot remove 'outputs/local': No such file or directory
Precomputing optical flow.
Computing RAFT flow: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:11<00:00,  1.72it/s]
Computing RAFT flow: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:10<00:00,  1.78it/s]
Using cache found in /home/bizon/.cache/torch/hub/facebookresearch_co-tracker_v1.0
Computing tracks: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [01:23<00:00,  2.77s/it]
Using cache found in /home/bizon/.cache/torch/hub/intel-isl_MiDaS_master
Loading weights:  None
Using cache found in /home/bizon/.cache/torch/hub/rwightman_gen-efficientnet-pytorch_master
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA GeForce RTX 4090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type  | Params
--------------------------------
0 | model | Model | 21.3 M
--------------------------------
21.3 M    Trainable params
0         Non-trainable params
21.3 M    Total params
85.382    Total estimated model params size (MB)
Sanity Checking: |                                                                                                                                                                                                                                                                                              | 0/? [00:00<?, ?it/s]/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance.
/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance. 
Epoch 0: |                                                                                                                                                                                                                                                                                                      | 0/? [00:00<?, ?it/s]Error executing job with overrides: ['dataset=images', 'dataset.images.root=./datasets/flowmap/co3d_bench/']
Traceback (most recent call last):
  File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/overfit.py", line 112, in overfit
    trainer.fit(
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 987, in _run
    results = self._run_stage()
              ^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 1033, in _run_stage
    self.fit_loop.run()
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py", line 205, in run
    self.advance()
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py", line 363, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 140, in run
    self.advance(data_fetcher)
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 250, in advance
    batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 190, in run
    self._optimizer_step(batch_idx, closure)
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 268, in _optimizer_step
    call._call_lightning_module_hook(
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/core/module.py", line 1303, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/core/optimizer.py", line 152, in step
    step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/strategies/strategy.py", line 239, in optimizer_step
    return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/plugins/precision/precision.py", line 122, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/optim/optimizer.py", line 391, in wrapper
    out = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
    ret = func(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/optim/adam.py", line 148, in step
    loss = closure()
           ^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/plugins/precision/precision.py", line 108, in _wrap_closure
    closure_result = closure()
                     ^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 144, in __call__
    self._result = self.closure(*args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 129, in closure
    step_output = self._step_fn()
                  ^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 318, in _training_step
    training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook
    output = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/strategies/strategy.py", line 391, in training_step
    return self.lightning_module.training_step(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/model/model_wrapper_overfit.py", line 53, in training_step
    model_output = self.model(self.batch, self.flows, self.global_step)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/jaxtyping/_decorator.py", line 450, in wrapped_fn
    out = fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/model/model.py", line 64, in forward
    backbone_out = self.backbone.forward(batch, flows)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/jaxtyping/_decorator.py", line 450, in wrapped_fn
    out = fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/model/backbone/backbone_midas.py", line 94, in forward
    backward_weights = self.compute_correspondence_weights(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/jaxtyping/_decorator.py", line 450, in wrapped_fn
    out = fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/model/backbone/backbone_midas.py", line 111, in compute_correspondence_weights
    weights = self.corr_weighter_perpoint(features).sigmoid().clip(min=1e-4)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
            ^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.33 GiB. GPU 

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Epoch 0: |                                                                                                                                       

Hi, I meet the same issue as you. What graphics card are you using and how much memory is there? I am using a 4090 with 24G of memory and the same error is reported. And I just sovle it with https://github.com/dcharatan/flowmap/issues/4

kk6398 avatar Apr 30 '24 01:04 kk6398

@kk6398 Thank you! I got the above link you mentioned(add +experiment=low_memory) and now it is working great Thanks!

(my current command): CUDA_VISIBLE_DEVICES=0 python -m flowmap.overfit dataset=images dataset.images.root=./datasets/flowmap/co3d_bench/ +experiment=low_memory

minounou avatar Apr 30 '24 05:04 minounou

See #4 and #13 for a few more comments on memory usage!

dcharatan avatar Apr 30 '24 16:04 dcharatan