fastMRI icon indicating copy to clipboard operation
fastMRI copied to clipboard

Error related to raw_sample_filter in _create_data_loader

Open mmuckley opened this issue 2 years ago • 0 comments

Discussed in https://github.com/facebookresearch/fastMRI/discussions/263

Creating an issue with this - seems like some aspects of sample filtering are bugged with recent changes.

Originally posted by mouryarahul August 24, 2022 Hi, I'm trying to run python train_unet_demo.py \ --mode test \ --test_split test \ --challenge singlecoil \ --data_path ../../../FastMRI_DATASET/knee_singlecoil_train/ \ --resume_from_checkpoint unet/unet_demo/checkpoints/epoch=1-step=69484.ckpt

where ../../../FastMRI_DATASET/knee_singlecoil_train/ contains all three folders: singlecoil_test, singlecoil_train and singlecoil_val

However, I'm getting an error related to raw_sample_filter in the case of the test dataset. Maybe I am missing something or doing something silly. Can someone please point out the mistake? Thanks!

Info about my environment: PyTorch version: 1.12.0+cu116 Is debug build: False CUDA used to build PyTorch: 11.6 ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04 LTS (x86_64) GCC version: (Ubuntu 11.2.0-19ubuntu1) 11.2.0 Clang version: Could not collect CMake version: version 3.22.1 Libc version: glibc-2.35

Python version: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.15.0-46-generic-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1070 Nvidia driver version: 515.65.01 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] numpy==1.22.3 [pip3] pytorch-lightning==1.7.2 [pip3] torch==1.12.0+cu116 [pip3] torchaudio==0.12.0+cu116 [pip3] torchmetrics==0.9.2 [pip3] torchvision==0.13.0+cu116 [conda] blas 1.0 mkl
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py310h7f8727e_0
[conda] mkl_fft 1.3.1 py310hd6ae3a3_0
[conda] mkl_random 1.2.2 py310h00e6091_0
[conda] numpy 1.22.3 py310hfa59a62_0
[conda] numpy-base 1.22.3 py310h9585f30_0
[conda] pytorch-lightning 1.7.2 pypi_0 pypi [conda] torch 1.12.0+cu116 pypi_0 pypi [conda] torchaudio 0.12.0+cu116 pypi_0 pypi [conda] torchmetrics 0.9.2 pypi_0 pypi [conda] torchvision 0.13.0+cu116 pypi_0 pypi

The full error msg:

Global seed set to 42 /home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Torchmetrics v0.9 introduced a new argument class property called full_state_update that has not been set for this class (DistributedMetricSum). The property determines if update by default needs access to the full metric state. If this is not the case, significant speedups can be achieved and we recommend setting this to False. We provide an checking function from torchmetrics.utilities import check_forward_no_full_state that can be used to check if the full_state_update=True (old and potential slower behaviour, default for now) or if full_state_update=False can be used safely. warnings.warn(*args, **kwargs) /home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:446: LightningDeprecationWarning: Setting Trainer(gpus=1) is deprecated in v1.7 and will be removed in v2.0. Please use Trainer(accelerator='gpu', devices=1) instead. rank_zero_deprecation( /home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py:52: LightningDeprecationWarning: Setting Trainer(resume_from_checkpoint=) is deprecated in v1.5 and will be removed in v1.7. Please pass Trainer.fit(ckpt_path=) directly instead. rank_zero_deprecation( GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs Global seed set to 42 Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1

distributed_backend=nccl All distributed processes registered. Starting with 1 processes

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Traceback (most recent call last): File "/media/rahul/DATA/WorkSpace/Multimodal-Data-Processing/Projects/fastMRI/fastmri_examples/unet/train_unet_demo.py", line 191, in run_cli() File "/media/rahul/DATA/WorkSpace/Multimodal-Data-Processing/Projects/fastMRI/fastmri_examples/unet/train_unet_demo.py", line 187, in run_cli cli_main(args) File "/media/rahul/DATA/WorkSpace/Multimodal-Data-Processing/Projects/fastMRI/fastmri_examples/unet/train_unet_demo.py", line 75, in cli_main trainer.test(model, datamodule=data_module) File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 864, in test return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule) File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 648, in _call_and_handle_interrupt return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs) File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch return function(*args, **kwargs) File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 911, in _test_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1168, in _run results = self._run_stage() File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1251, in _run_stage return self._run_evaluate() File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1291, in _run_evaluate self._evaluation_loop._reload_evaluation_dataloaders() File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 234, in _reload_evaluation_dataloaders self.trainer.reset_test_dataloader() File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1944, in reset_test_dataloader self.num_test_batches, self.test_dataloaders = self._data_connector._reset_eval_dataloader( File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 348, in _reset_eval_dataloader dataloaders = self._request_dataloader(mode) File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 436, in _request_dataloader dataloader = source.dataloader() File "/home/rahul/anaconda3/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 513, in dataloader return method() File "/media/rahul/DATA/WorkSpace/Multimodal-Data-Processing/Projects/fastMRI/fastmri/pl_modules/data_module.py", line 325, in test_dataloader return self._create_data_loader( File "/media/rahul/DATA/WorkSpace/Multimodal-Data-Processing/Projects/fastMRI/fastmri/pl_modules/data_module.py", line 262, in _create_data_loader raw_sample_filter=raw_sample_filter, UnboundLocalError: local variable 'raw_sample_filter' referenced before assignment

mmuckley avatar Oct 03 '22 18:10 mmuckley