segger_dev icon indicating copy to clipboard operation
segger_dev copied to clipboard

[BUG] ValueError while tuning parameters

Open marzaidi opened this issue 1 year ago • 2 comments

i am on the last step of the introduction to segger tutorial and i am getting an error while tuning parameters, kindly let me know if you need more information. i am not sure why this is happening, i tried to troubleshoot but nothing worked.

my code:

param_space = {
    "k_bd": [3, 5, 10],
    "dist_bd": [5, 10, 15, 20],
    "k_tx": [3, 5, 10],
    "dist_tx": [3, 5, 10],
}

metrics = []

for params in itertools.product(*param_space.values()):

    config = dict(zip(param_space.keys(), params))

    # Setup directories
    trial_dir = tuning_dir / '_'.join([f'{k}={v}' for k, v in config.items()])

    data_dir = trial_dir / 'segger_data'
    data_dir.mkdir(exist_ok=True, parents=True)
    config['data_dir'] = data_dir

    model_dir = trial_dir / 'models'
    model_dir.mkdir(exist_ok=True, parents=True)
    config['model_dir'] = model_dir

    segmentation = trainable(config)
    trial = evaluate(segmentation, predict_kwargs['score_cut'])
    trial = pd.concat([pd.Series(config), trial])
    metrics.append(trial)

metrics = pd.DataFrame(metrics)

the error:

Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name      | Type              | Params | Mode 
--------------------------------------------------------
0 | model     | GraphModule       | 9.7 K  | train
1 | criterion | BCEWithLogitsLoss | 0      | train
--------------------------------------------------------
9.7 K     Trainable params
0         Non-trainable params
9.7 K     Total params
0.039     Total estimated model params size (MB)
45        Modules in train mode
0         Modules in eval mode
SLURM auto-requeueing enabled. Setting signal handlers.
                                                  
[/dss/dsshome1/0C/go76saz2/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py:105](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py#line=104): Total length of `DataLoader` across ranks is zero. Please make sure this was your intention.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[44], line 25
     22 model_dir.mkdir(exist_ok=True, parents=True)
     23 config['model_dir'] = model_dir
---> 25 segmentation = trainable(config)
     26 trial = evaluate(segmentation, predict_kwargs['score_cut'])
     27 trial = pd.concat([pd.Series(config), trial])

Cell In[35], line 30, in trainable(config)
     20 dm = SeggerDataModule(
     21     data_dir=config['data_dir'],
     22     batch_size=2,
     23     num_workers=dataset_kwargs['num_workers'],
     24 )
     25 trainer = Trainer(
     26     default_root_dir=config['model_dir'],
     27     logger=CSVLogger(config['model_dir']),
     28     **trainer_kwargs,
     29 )
---> 30 trainer.fit(model=ls, datamodule=dm)
     32 segmentation = predict(
     33     load_model(config['model_dir'][/](https://ood-2.ai.lrz.de/)'lightning_logs[/version_0/checkpoints](https://ood-2.ai.lrz.de/version_0/checkpoints)'),
     34     dm.train_dataloader(),
     35     receptive_field=receptive_field,
     36     **predict_kwargs,
     37 )
     39 metrics = evaluate(segmentation)

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:538](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py#line=537), in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    536 self.state.status = TrainerStatus.RUNNING
    537 self.training = True
--> 538 call._call_and_handle_interrupt(
    539     self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    540 )

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:47](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py#line=46), in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     45     if trainer.strategy.launcher is not None:
     46         return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 47     return trainer_fn(*args, **kwargs)
     49 except _TunerExitException:
     50     _call_teardown_hook(trainer)

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:574](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py#line=573), in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    567 assert self.state.fn is not None
    568 ckpt_path = self._checkpoint_connector._select_ckpt_path(
    569     self.state.fn,
    570     ckpt_path,
    571     model_provided=True,
    572     model_connected=self.lightning_module is not None,
    573 )
--> 574 self._run(model, ckpt_path=ckpt_path)
    576 assert self.state.stopped
    577 self.training = False

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:981](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py#line=980), in Trainer._run(self, model, ckpt_path)
    976 self._signal_connector.register_signal_handlers()
    978 # ----------------------------
    979 # RUN THE TRAINER
    980 # ----------------------------
--> 981 results = self._run_stage()
    983 # ----------------------------
    984 # POST-Training CLEAN UP
    985 # ----------------------------
    986 log.debug(f"{self.__class__.__name__}: trainer tearing down")

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1025](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py#line=1024), in Trainer._run_stage(self)
   1023         self._run_sanity_check()
   1024     with torch.autograd.set_detect_anomaly(self._detect_anomaly):
-> 1025         self.fit_loop.run()
   1026     return None
   1027 raise RuntimeError(f"Unexpected state {self.state}")

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:197](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py#line=196), in _FitLoop.run(self)
    196 def run(self) -> None:
--> 197     self.setup_data()
    198     if self.skip:
    199         return

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:225](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py#line=224), in _FitLoop.setup_data(self)
    222 log.debug(f"{self.__class__.__name__}: resetting train dataloader")
    224 source = self._data_source
--> 225 train_dataloader = _request_dataloader(source)
    226 trainer.strategy.barrier("train_dataloader()")
    228 if not isinstance(train_dataloader, CombinedLoader):

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:325](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py#line=324), in _request_dataloader(data_source)
    314 """Requests a dataloader by calling dataloader hooks corresponding to the given stage.
    315 
    316 Returns:
    317     The requested dataloader
    318 
    319 """
    320 with _replace_dunder_methods(DataLoader, "dataset"), _replace_dunder_methods(BatchSampler):
    321     # under this context manager, the arguments passed to `DataLoader.__init__` will be captured and saved as
    322     # attributes on the instance in case the dataloader needs to be re-instantiated later by Lightning.
    323     # Also, it records all attribute setting and deletion using patched `__setattr__` and `__delattr__`
    324     # methods so that the re-instantiated object is as close to the original as possible.
--> 325     return data_source.dataloader()

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:292](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py#line=291), in _DataLoaderSource.dataloader(self)
    290 if isinstance(self.instance, pl.LightningDataModule):
    291     assert self.instance.trainer is not None
--> 292     return call._call_lightning_datamodule_hook(self.instance.trainer, self.name)
    293 assert self.instance is not None
    294 return self.instance

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:189](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py#line=188), in _call_lightning_datamodule_hook(trainer, hook_name, *args, **kwargs)
    187 if callable(fn):
    188     with trainer.profiler.profile(f"[LightningDataModule]{trainer.datamodule.__class__.__name__}.{hook_name}"):
--> 189         return fn(*args, **kwargs)
    190 return None

File [~/segger_dev/src/segger/training/segger_data_module.py:35](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/segger_dev/src/segger/training/segger_data_module.py#line=34), in SeggerDataModule.train_dataloader(self)
     34 def train_dataloader(self):
---> 35     return DataLoader(self.train, shuffle=True, **self.loader_kwargs)

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py:324](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py#line=323), in _wrap_init_method.<locals>.wrapper(obj, *args, **kwargs)
    321     elif store_explicit_arg in kwargs:
    322         object.__setattr__(obj, f"__{store_explicit_arg}", kwargs[store_explicit_arg])
--> 324 init(obj, *args, **kwargs)
    325 object.__setattr__(obj, "__pl_inside_init", old_inside_init)

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py:324](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py#line=323), in _wrap_init_method.<locals>.wrapper(obj, *args, **kwargs)
    321     elif store_explicit_arg in kwargs:
    322         object.__setattr__(obj, f"__{store_explicit_arg}", kwargs[store_explicit_arg])
--> 324 init(obj, *args, **kwargs)
    325 object.__setattr__(obj, "__pl_inside_init", old_inside_init)

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py:324](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py#line=323), in _wrap_init_method.<locals>.wrapper(obj, *args, **kwargs)
    321     elif store_explicit_arg in kwargs:
    322         object.__setattr__(obj, f"__{store_explicit_arg}", kwargs[store_explicit_arg])
--> 324 init(obj, *args, **kwargs)
    325 object.__setattr__(obj, "__pl_inside_init", old_inside_init)

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/torch_geometric/loader/dataloader.py:87](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/torch_geometric/loader/dataloader.py#line=86), in DataLoader.__init__(self, dataset, batch_size, shuffle, follow_batch, exclude_keys, **kwargs)
     84 self.follow_batch = follow_batch
     85 self.exclude_keys = exclude_keys
---> 87 super().__init__(
     88     dataset,
     89     batch_size,
     90     shuffle,
     91     collate_fn=Collater(dataset, follow_batch, exclude_keys),
     92     **kwargs,
     93 )

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py:324](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py#line=323), in _wrap_init_method.<locals>.wrapper(obj, *args, **kwargs)
    321     elif store_explicit_arg in kwargs:
    322         object.__setattr__(obj, f"__{store_explicit_arg}", kwargs[store_explicit_arg])
--> 324 init(obj, *args, **kwargs)
    325 object.__setattr__(obj, "__pl_inside_init", old_inside_init)

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py:324](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py#line=323), in _wrap_init_method.<locals>.wrapper(obj, *args, **kwargs)
    321     elif store_explicit_arg in kwargs:
    322         object.__setattr__(obj, f"__{store_explicit_arg}", kwargs[store_explicit_arg])
--> 324 init(obj, *args, **kwargs)
    325 object.__setattr__(obj, "__pl_inside_init", old_inside_init)

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py:324](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py#line=323), in _wrap_init_method.<locals>.wrapper(obj, *args, **kwargs)
    321     elif store_explicit_arg in kwargs:
    322         object.__setattr__(obj, f"__{store_explicit_arg}", kwargs[store_explicit_arg])
--> 324 init(obj, *args, **kwargs)
    325 object.__setattr__(obj, "__pl_inside_init", old_inside_init)

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/torch/utils/data/dataloader.py:376](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/torch/utils/data/dataloader.py#line=375), in DataLoader.__init__(self, dataset, batch_size, shuffle, sampler, batch_sampler, num_workers, collate_fn, pin_memory, drop_last, timeout, worker_init_fn, multiprocessing_context, generator, prefetch_factor, persistent_workers, pin_memory_device)
    374 else:  # map-style
    375     if shuffle:
--> 376         sampler = RandomSampler(dataset, generator=generator)  # type: ignore[arg-type]
    377     else:
    378         sampler = SequentialSampler(dataset)  # type: ignore[arg-type]

File [~/miniconda3/envs/segger/lib/python3.10/site-packages/torch/utils/data/sampler.py:164](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/torch/utils/data/sampler.py#line=163), in RandomSampler.__init__(self, data_source, replacement, num_samples, generator)
    159     raise TypeError(
    160         f"replacement should be a boolean value, but got replacement={self.replacement}"
    161     )
    163 if not isinstance(self.num_samples, int) or self.num_samples <= 0:
--> 164     raise ValueError(
    165         f"num_samples should be a positive integer value, but got num_samples={self.num_samples}"
    166     )

ValueError: num_samples should be a positive integer value, but got num_samples=0

i also have another question regarding the output files that will be generated, where can i find information on how to find them and assess them? do you have any tutorial for that?

marzaidi avatar Nov 16 '24 17:11 marzaidi

Dear @marzaidi parameter tuning is still under development we are foreseeing solving it in January. Thanks for your patience!

EliHei2 avatar Dec 17 '24 10:12 EliHei2

Looking forward to this! stuck in the same step :)

sarajimenez avatar Jan 21 '25 22:01 sarajimenez