The bug when running the Code
Hi, thank for your share.
I meet the bug when I run the code: "python3 -m flowmap.overfit dataset=images dataset.images.root="/xxx/flowmap/mipnerf360_garden"" Do you have any method to solve it?
Hi, thank for your share. I meet the bug when I run the code: "python3 -m flowmap.overfit dataset=images dataset.images.root="/xxx/flowmap/mipnerf360_garden"" Do you have any method to solve it?
![]()
I got the same error-message as above, after that I commentd the "SLURMEnvironment" in overfit.py, then it seems stated work but got GPU out-of-memory error-message:
...
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/functional.py", line 2553, in instance_norm
return torch.instance_norm(
^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 560.00 MiB. GPU
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Computing RAFT flow: 0%| | 0/6 [00:00<?, ?it/s]
[rank: 1] Child process with PID 17642 terminated with code 1. Forcefully terminating all other processes to avoid zombies 🧟
Killed
...
I got the same error-message as above, after that I commentd the "SLURMEnvironment" in overfit.py, then it seems stated work but got GPU out-of-memory error-message:
... File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/functional.py", line 2553, in instance_norm return torch.instance_norm( ^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 560.00 MiB. GPU Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. Computing RAFT flow: 0%| | 0/6 [00:00<?, ?it/s] [rank: 1] Child process with PID 17642 terminated with code 1. Forcefully terminating all other processes to avoid zombies 🧟 Killed ...
same goes here
I got the same error-message as above, after that I commentd the "SLURMEnvironment" in overfit.py, then it seems stated work but got GPU out-of-memory error-message:
... File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/functional.py", line 2553, in instance_norm return torch.instance_norm( ^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 560.00 MiB. GPU Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. Computing RAFT flow: 0%| | 0/6 [00:00<?, ?it/s] [rank: 1] Child process with PID 17642 terminated with code 1. Forcefully terminating all other processes to avoid zombies 🧟 Killed ...same goes here
same OOM issue, but force using a single GPU worked for me when running overfit.py
I got the same error-message as above, after that I commentd the "SLURMEnvironment" in overfit.py, then it seems stated work but got GPU out-of-memory error-message:
... File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/functional.py", line 2553, in instance_norm return torch.instance_norm( ^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 560.00 MiB. GPU Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. Computing RAFT flow: 0%| | 0/6 [00:00<?, ?it/s] [rank: 1] Child process with PID 17642 terminated with code 1. Forcefully terminating all other processes to avoid zombies 🧟 Killed ...same goes here
same OOM issue, but force using a single GPU worked for me when running overfit.py
How to "force using a single GPU"? What the specifical code?
I got the same error-message as above, after that I commentd the "SLURMEnvironment" in overfit.py, then it seems stated work but got GPU out-of-memory error-message:
... File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/functional.py", line 2553, in instance_norm return torch.instance_norm( ^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 560.00 MiB. GPU Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. Computing RAFT flow: 0%| | 0/6 [00:00<?, ?it/s] [rank: 1] Child process with PID 17642 terminated with code 1. Forcefully terminating all other processes to avoid zombies 🧟 Killed ...same goes here
same OOM issue, but force using a single GPU worked for me when running overfit.py
How to "force using a single GPU"? What the specifical code?
comment the "SLURMEnvironment", and run with CUDA_VISIBLE_DEVICES=0 python -m xxxx
OK got it I run the following command and still got cuda-memory-error, could you take a look Thanks!
command:
CUDA_VISIBLE_DEVICES=0 python -m flowmap.overfit dataset=images dataset.images.root=./datasets/flowmap/co3d_bench/
error-message:
CUDA_VISIBLE_DEVICES=0 python -m flowmap.overfit dataset=images dataset.images.root=./datasets/flowmap/co3d_bench/
rm: cannot remove 'outputs/local': No such file or directory
Precomputing optical flow.
Computing RAFT flow: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:11<00:00, 1.72it/s]
Computing RAFT flow: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:10<00:00, 1.78it/s]
Using cache found in /home/bizon/.cache/torch/hub/facebookresearch_co-tracker_v1.0
Computing tracks: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [01:23<00:00, 2.77s/it]
Using cache found in /home/bizon/.cache/torch/hub/intel-isl_MiDaS_master
Loading weights: None
Using cache found in /home/bizon/.cache/torch/hub/rwightman_gen-efficientnet-pytorch_master
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA GeForce RTX 4090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
--------------------------------
0 | model | Model | 21.3 M
--------------------------------
21.3 M Trainable params
0 Non-trainable params
21.3 M Total params
85.382 Total estimated model params size (MB)
Sanity Checking: | | 0/? [00:00<?, ?it/s]/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance.
/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance.
Epoch 0: | | 0/? [00:00<?, ?it/s]Error executing job with overrides: ['dataset=images', 'dataset.images.root=./datasets/flowmap/co3d_bench/']
Traceback (most recent call last):
File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/overfit.py", line 112, in overfit
trainer.fit(
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 987, in _run
results = self._run_stage()
^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 1033, in _run_stage
self.fit_loop.run()
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py", line 205, in run
self.advance()
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py", line 363, in advance
self.epoch_loop.run(self._data_fetcher)
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 140, in run
self.advance(data_fetcher)
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 250, in advance
batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 190, in run
self._optimizer_step(batch_idx, closure)
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 268, in _optimizer_step
call._call_lightning_module_hook(
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook
output = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/core/module.py", line 1303, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/core/optimizer.py", line 152, in step
step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/strategies/strategy.py", line 239, in optimizer_step
return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/plugins/precision/precision.py", line 122, in optimizer_step
return optimizer.step(closure=closure, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/optim/optimizer.py", line 391, in wrapper
out = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
ret = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/optim/adam.py", line 148, in step
loss = closure()
^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/plugins/precision/precision.py", line 108, in _wrap_closure
closure_result = closure()
^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 144, in __call__
self._result = self.closure(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 129, in closure
step_output = self._step_fn()
^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 318, in _training_step
training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook
output = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/strategies/strategy.py", line 391, in training_step
return self.lightning_module.training_step(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/model/model_wrapper_overfit.py", line 53, in training_step
model_output = self.model(self.batch, self.flows, self.global_step)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/jaxtyping/_decorator.py", line 450, in wrapped_fn
out = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/model/model.py", line 64, in forward
backbone_out = self.backbone.forward(batch, flows)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/jaxtyping/_decorator.py", line 450, in wrapped_fn
out = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/model/backbone/backbone_midas.py", line 94, in forward
backward_weights = self.compute_correspondence_weights(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/jaxtyping/_decorator.py", line 450, in wrapped_fn
out = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/model/backbone/backbone_midas.py", line 111, in compute_correspondence_weights
weights = self.corr_weighter_perpoint(features).sigmoid().clip(min=1e-4)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 116, in forward
return F.linear(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.33 GiB. GPU
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Epoch 0: |
OK got it I run the following command and still got cuda-memory-error, could you take a look Thanks!
command:
CUDA_VISIBLE_DEVICES=0 python -m flowmap.overfit dataset=images dataset.images.root=./datasets/flowmap/co3d_bench/error-message:
CUDA_VISIBLE_DEVICES=0 python -m flowmap.overfit dataset=images dataset.images.root=./datasets/flowmap/co3d_bench/ rm: cannot remove 'outputs/local': No such file or directory Precomputing optical flow. Computing RAFT flow: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:11<00:00, 1.72it/s] Computing RAFT flow: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:10<00:00, 1.78it/s] Using cache found in /home/bizon/.cache/torch/hub/facebookresearch_co-tracker_v1.0 Computing tracks: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [01:23<00:00, 2.77s/it] Using cache found in /home/bizon/.cache/torch/hub/intel-isl_MiDaS_master Loading weights: None Using cache found in /home/bizon/.cache/torch/hub/rwightman_gen-efficientnet-pytorch_master GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs You are using a CUDA device ('NVIDIA GeForce RTX 4090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] | Name | Type | Params -------------------------------- 0 | model | Model | 21.3 M -------------------------------- 21.3 M Trainable params 0 Non-trainable params 21.3 M Total params 85.382 Total estimated model params size (MB) Sanity Checking: | | 0/? [00:00<?, ?it/s]/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance. /home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance. Epoch 0: | | 0/? [00:00<?, ?it/s]Error executing job with overrides: ['dataset=images', 'dataset.images.root=./datasets/flowmap/co3d_bench/'] Traceback (most recent call last): File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/overfit.py", line 112, in overfit trainer.fit( File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit call._call_and_handle_interrupt( File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 987, in _run results = self._run_stage() ^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 1033, in _run_stage self.fit_loop.run() File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py", line 205, in run self.advance() File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py", line 363, in advance self.epoch_loop.run(self._data_fetcher) File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 140, in run self.advance(data_fetcher) File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 250, in advance batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 190, in run self._optimizer_step(batch_idx, closure) File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 268, in _optimizer_step call._call_lightning_module_hook( File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook output = fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/core/module.py", line 1303, in optimizer_step optimizer.step(closure=optimizer_closure) File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/core/optimizer.py", line 152, in step step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/strategies/strategy.py", line 239, in optimizer_step return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/plugins/precision/precision.py", line 122, in optimizer_step return optimizer.step(closure=closure, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/optim/optimizer.py", line 391, in wrapper out = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/optim/optimizer.py", line 76, in _use_grad ret = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/optim/adam.py", line 148, in step loss = closure() ^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/plugins/precision/precision.py", line 108, in _wrap_closure closure_result = closure() ^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 144, in __call__ self._result = self.closure(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 129, in closure step_output = self._step_fn() ^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 318, in _training_step training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook output = fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/lightning/pytorch/strategies/strategy.py", line 391, in training_step return self.lightning_module.training_step(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/model/model_wrapper_overfit.py", line 53, in training_step model_output = self.model(self.batch, self.flows, self.global_step) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/jaxtyping/_decorator.py", line 450, in wrapped_fn out = fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/model/model.py", line 64, in forward backbone_out = self.backbone.forward(batch, flows) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/jaxtyping/_decorator.py", line 450, in wrapped_fn out = fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/model/backbone/backbone_midas.py", line 94, in forward backward_weights = self.compute_correspondence_weights( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/jaxtyping/_decorator.py", line 450, in wrapped_fn out = fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/bizon/boxu/models/2-MVS/static-scene_tmpnotUSEFUL/nerf-3dgs/msINPUTdepthmapOUTPUT202404TIMEflowmap/flowmap/flowmap/model/backbone/backbone_midas.py", line 111, in compute_correspondence_weights weights = self.corr_weighter_perpoint(features).sigmoid().clip(min=1e-4) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) ^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bizon/anaconda3/envs/flowmap/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 116, in forward return F.linear(input, self.weight, self.bias) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.33 GiB. GPU Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. Epoch 0: |
Hi, I meet the same issue as you. What graphics card are you using and how much memory is there? I am using a 4090 with 24G of memory and the same error is reported. And I just sovle it with https://github.com/dcharatan/flowmap/issues/4
@kk6398 Thank you! I got the above link you mentioned(add +experiment=low_memory) and now it is working great Thanks!
(my current command):
CUDA_VISIBLE_DEVICES=0 python -m flowmap.overfit dataset=images dataset.images.root=./datasets/flowmap/co3d_bench/ +experiment=low_memory
See #4 and #13 for a few more comments on memory usage!
