segger_dev
segger_dev copied to clipboard
[BUG] ValueError while tuning parameters
i am on the last step of the introduction to segger tutorial and i am getting an error while tuning parameters, kindly let me know if you need more information. i am not sure why this is happening, i tried to troubleshoot but nothing worked.
my code:
param_space = {
"k_bd": [3, 5, 10],
"dist_bd": [5, 10, 15, 20],
"k_tx": [3, 5, 10],
"dist_tx": [3, 5, 10],
}
metrics = []
for params in itertools.product(*param_space.values()):
config = dict(zip(param_space.keys(), params))
# Setup directories
trial_dir = tuning_dir / '_'.join([f'{k}={v}' for k, v in config.items()])
data_dir = trial_dir / 'segger_data'
data_dir.mkdir(exist_ok=True, parents=True)
config['data_dir'] = data_dir
model_dir = trial_dir / 'models'
model_dir.mkdir(exist_ok=True, parents=True)
config['model_dir'] = model_dir
segmentation = trainable(config)
trial = evaluate(segmentation, predict_kwargs['score_cut'])
trial = pd.concat([pd.Series(config), trial])
metrics.append(trial)
metrics = pd.DataFrame(metrics)
the error:
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params | Mode
--------------------------------------------------------
0 | model | GraphModule | 9.7 K | train
1 | criterion | BCEWithLogitsLoss | 0 | train
--------------------------------------------------------
9.7 K Trainable params
0 Non-trainable params
9.7 K Total params
0.039 Total estimated model params size (MB)
45 Modules in train mode
0 Modules in eval mode
SLURM auto-requeueing enabled. Setting signal handlers.
[/dss/dsshome1/0C/go76saz2/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py:105](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py#line=104): Total length of `DataLoader` across ranks is zero. Please make sure this was your intention.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[44], line 25
22 model_dir.mkdir(exist_ok=True, parents=True)
23 config['model_dir'] = model_dir
---> 25 segmentation = trainable(config)
26 trial = evaluate(segmentation, predict_kwargs['score_cut'])
27 trial = pd.concat([pd.Series(config), trial])
Cell In[35], line 30, in trainable(config)
20 dm = SeggerDataModule(
21 data_dir=config['data_dir'],
22 batch_size=2,
23 num_workers=dataset_kwargs['num_workers'],
24 )
25 trainer = Trainer(
26 default_root_dir=config['model_dir'],
27 logger=CSVLogger(config['model_dir']),
28 **trainer_kwargs,
29 )
---> 30 trainer.fit(model=ls, datamodule=dm)
32 segmentation = predict(
33 load_model(config['model_dir'][/](https://ood-2.ai.lrz.de/)'lightning_logs[/version_0/checkpoints](https://ood-2.ai.lrz.de/version_0/checkpoints)'),
34 dm.train_dataloader(),
35 receptive_field=receptive_field,
36 **predict_kwargs,
37 )
39 metrics = evaluate(segmentation)
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:538](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py#line=537), in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
536 self.state.status = TrainerStatus.RUNNING
537 self.training = True
--> 538 call._call_and_handle_interrupt(
539 self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
540 )
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:47](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py#line=46), in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
45 if trainer.strategy.launcher is not None:
46 return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 47 return trainer_fn(*args, **kwargs)
49 except _TunerExitException:
50 _call_teardown_hook(trainer)
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:574](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py#line=573), in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
567 assert self.state.fn is not None
568 ckpt_path = self._checkpoint_connector._select_ckpt_path(
569 self.state.fn,
570 ckpt_path,
571 model_provided=True,
572 model_connected=self.lightning_module is not None,
573 )
--> 574 self._run(model, ckpt_path=ckpt_path)
576 assert self.state.stopped
577 self.training = False
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:981](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py#line=980), in Trainer._run(self, model, ckpt_path)
976 self._signal_connector.register_signal_handlers()
978 # ----------------------------
979 # RUN THE TRAINER
980 # ----------------------------
--> 981 results = self._run_stage()
983 # ----------------------------
984 # POST-Training CLEAN UP
985 # ----------------------------
986 log.debug(f"{self.__class__.__name__}: trainer tearing down")
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1025](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py#line=1024), in Trainer._run_stage(self)
1023 self._run_sanity_check()
1024 with torch.autograd.set_detect_anomaly(self._detect_anomaly):
-> 1025 self.fit_loop.run()
1026 return None
1027 raise RuntimeError(f"Unexpected state {self.state}")
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:197](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py#line=196), in _FitLoop.run(self)
196 def run(self) -> None:
--> 197 self.setup_data()
198 if self.skip:
199 return
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:225](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py#line=224), in _FitLoop.setup_data(self)
222 log.debug(f"{self.__class__.__name__}: resetting train dataloader")
224 source = self._data_source
--> 225 train_dataloader = _request_dataloader(source)
226 trainer.strategy.barrier("train_dataloader()")
228 if not isinstance(train_dataloader, CombinedLoader):
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:325](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py#line=324), in _request_dataloader(data_source)
314 """Requests a dataloader by calling dataloader hooks corresponding to the given stage.
315
316 Returns:
317 The requested dataloader
318
319 """
320 with _replace_dunder_methods(DataLoader, "dataset"), _replace_dunder_methods(BatchSampler):
321 # under this context manager, the arguments passed to `DataLoader.__init__` will be captured and saved as
322 # attributes on the instance in case the dataloader needs to be re-instantiated later by Lightning.
323 # Also, it records all attribute setting and deletion using patched `__setattr__` and `__delattr__`
324 # methods so that the re-instantiated object is as close to the original as possible.
--> 325 return data_source.dataloader()
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:292](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py#line=291), in _DataLoaderSource.dataloader(self)
290 if isinstance(self.instance, pl.LightningDataModule):
291 assert self.instance.trainer is not None
--> 292 return call._call_lightning_datamodule_hook(self.instance.trainer, self.name)
293 assert self.instance is not None
294 return self.instance
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:189](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py#line=188), in _call_lightning_datamodule_hook(trainer, hook_name, *args, **kwargs)
187 if callable(fn):
188 with trainer.profiler.profile(f"[LightningDataModule]{trainer.datamodule.__class__.__name__}.{hook_name}"):
--> 189 return fn(*args, **kwargs)
190 return None
File [~/segger_dev/src/segger/training/segger_data_module.py:35](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/segger_dev/src/segger/training/segger_data_module.py#line=34), in SeggerDataModule.train_dataloader(self)
34 def train_dataloader(self):
---> 35 return DataLoader(self.train, shuffle=True, **self.loader_kwargs)
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py:324](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py#line=323), in _wrap_init_method.<locals>.wrapper(obj, *args, **kwargs)
321 elif store_explicit_arg in kwargs:
322 object.__setattr__(obj, f"__{store_explicit_arg}", kwargs[store_explicit_arg])
--> 324 init(obj, *args, **kwargs)
325 object.__setattr__(obj, "__pl_inside_init", old_inside_init)
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py:324](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py#line=323), in _wrap_init_method.<locals>.wrapper(obj, *args, **kwargs)
321 elif store_explicit_arg in kwargs:
322 object.__setattr__(obj, f"__{store_explicit_arg}", kwargs[store_explicit_arg])
--> 324 init(obj, *args, **kwargs)
325 object.__setattr__(obj, "__pl_inside_init", old_inside_init)
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py:324](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py#line=323), in _wrap_init_method.<locals>.wrapper(obj, *args, **kwargs)
321 elif store_explicit_arg in kwargs:
322 object.__setattr__(obj, f"__{store_explicit_arg}", kwargs[store_explicit_arg])
--> 324 init(obj, *args, **kwargs)
325 object.__setattr__(obj, "__pl_inside_init", old_inside_init)
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/torch_geometric/loader/dataloader.py:87](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/torch_geometric/loader/dataloader.py#line=86), in DataLoader.__init__(self, dataset, batch_size, shuffle, follow_batch, exclude_keys, **kwargs)
84 self.follow_batch = follow_batch
85 self.exclude_keys = exclude_keys
---> 87 super().__init__(
88 dataset,
89 batch_size,
90 shuffle,
91 collate_fn=Collater(dataset, follow_batch, exclude_keys),
92 **kwargs,
93 )
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py:324](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py#line=323), in _wrap_init_method.<locals>.wrapper(obj, *args, **kwargs)
321 elif store_explicit_arg in kwargs:
322 object.__setattr__(obj, f"__{store_explicit_arg}", kwargs[store_explicit_arg])
--> 324 init(obj, *args, **kwargs)
325 object.__setattr__(obj, "__pl_inside_init", old_inside_init)
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py:324](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py#line=323), in _wrap_init_method.<locals>.wrapper(obj, *args, **kwargs)
321 elif store_explicit_arg in kwargs:
322 object.__setattr__(obj, f"__{store_explicit_arg}", kwargs[store_explicit_arg])
--> 324 init(obj, *args, **kwargs)
325 object.__setattr__(obj, "__pl_inside_init", old_inside_init)
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py:324](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/lightning_fabric/utilities/data.py#line=323), in _wrap_init_method.<locals>.wrapper(obj, *args, **kwargs)
321 elif store_explicit_arg in kwargs:
322 object.__setattr__(obj, f"__{store_explicit_arg}", kwargs[store_explicit_arg])
--> 324 init(obj, *args, **kwargs)
325 object.__setattr__(obj, "__pl_inside_init", old_inside_init)
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/torch/utils/data/dataloader.py:376](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/torch/utils/data/dataloader.py#line=375), in DataLoader.__init__(self, dataset, batch_size, shuffle, sampler, batch_sampler, num_workers, collate_fn, pin_memory, drop_last, timeout, worker_init_fn, multiprocessing_context, generator, prefetch_factor, persistent_workers, pin_memory_device)
374 else: # map-style
375 if shuffle:
--> 376 sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type]
377 else:
378 sampler = SequentialSampler(dataset) # type: ignore[arg-type]
File [~/miniconda3/envs/segger/lib/python3.10/site-packages/torch/utils/data/sampler.py:164](https://ood-2.ai.lrz.de/node/dgx-002.ai.lrz.de/8908/lab/tree/~/miniconda3/envs/segger/lib/python3.10/site-packages/torch/utils/data/sampler.py#line=163), in RandomSampler.__init__(self, data_source, replacement, num_samples, generator)
159 raise TypeError(
160 f"replacement should be a boolean value, but got replacement={self.replacement}"
161 )
163 if not isinstance(self.num_samples, int) or self.num_samples <= 0:
--> 164 raise ValueError(
165 f"num_samples should be a positive integer value, but got num_samples={self.num_samples}"
166 )
ValueError: num_samples should be a positive integer value, but got num_samples=0
i also have another question regarding the output files that will be generated, where can i find information on how to find them and assess them? do you have any tutorial for that?
Dear @marzaidi parameter tuning is still under development we are foreseeing solving it in January. Thanks for your patience!
Looking forward to this! stuck in the same step :)