dowhy icon indicating copy to clipboard operation
dowhy copied to clipboard

while trying dowhy_causal_prediction_demo.ipynb: TypeError: can't convert cuda:0 device type tensor to numpy

Open JPZ4-5 opened this issue 9 months ago • 1 comments

Describe the bug When trying to run dowhy_causal_prediction_demo.ipynb, I encountered with this error when running the cell

trainer = pl.Trainer(devices=1, max_epochs=5) 

trainer.fit(algorithm, loaders['train_loaders'], loaders['val_loaders'])

:

You are using the plain ModelCheckpoint callback. Consider using LitModelCheckpoint which with seamless uploading to Model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2]

  | Name  | Type       | Params | Mode 
---------------------------------------------
0 | model | Sequential | 306 K  | train
---------------------------------------------
306 K     Trainable params
0         Non-trainable params
306 K     Total params
1.226     Total estimated model params size (MB)
8         Modules in train mode
0         Modules in eval mode

Epoch 0:   0%
 0/312 [00:00<?, ?it/s]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[15], line 3
      1 trainer = pl.Trainer(devices=1, max_epochs=5) 
----> 3 trainer.fit(algorithm, loaders['train_loaders'], loaders['val_loaders'])

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py:561](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py#line=560), in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    559 self.training = True
    560 self.should_stop = False
--> 561 call._call_and_handle_interrupt(
    562     self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    563 )

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py:48](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py#line=47), in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     46     if trainer.strategy.launcher is not None:
     47         return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 48     return trainer_fn(*args, **kwargs)
     50 except _TunerExitException:
     51     _call_teardown_hook(trainer)

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py:599](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py#line=598), in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    592     download_model_from_registry(ckpt_path, self)
    593 ckpt_path = self._checkpoint_connector._select_ckpt_path(
    594     self.state.fn,
    595     ckpt_path,
    596     model_provided=True,
    597     model_connected=self.lightning_module is not None,
    598 )
--> 599 self._run(model, ckpt_path=ckpt_path)
    601 assert self.state.stopped
    602 self.training = False

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py:1012](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py#line=1011), in Trainer._run(self, model, ckpt_path)
   1007 self._signal_connector.register_signal_handlers()
   1009 # ----------------------------
   1010 # RUN THE TRAINER
   1011 # ----------------------------
-> 1012 results = self._run_stage()
   1014 # ----------------------------
   1015 # POST-Training CLEAN UP
   1016 # ----------------------------
   1017 log.debug(f"{self.__class__.__name__}: trainer tearing down")

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py:1056](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py#line=1055), in Trainer._run_stage(self)
   1054         self._run_sanity_check()
   1055     with torch.autograd.set_detect_anomaly(self._detect_anomaly):
-> 1056         self.fit_loop.run()
   1057     return None
   1058 raise RuntimeError(f"Unexpected state {self.state}")

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/fit_loop.py:216](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/fit_loop.py#line=215), in _FitLoop.run(self)
    214 try:
    215     self.on_advance_start()
--> 216     self.advance()
    217     self.on_advance_end()
    218 except StopIteration:

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/fit_loop.py:455](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/fit_loop.py#line=454), in _FitLoop.advance(self)
    453 with self.trainer.profiler.profile("run_training_epoch"):
    454     assert self._data_fetcher is not None
--> 455     self.epoch_loop.run(self._data_fetcher)

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/training_epoch_loop.py:150](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/training_epoch_loop.py#line=149), in _TrainingEpochLoop.run(self, data_fetcher)
    148 while not self.done:
    149     try:
--> 150         self.advance(data_fetcher)
    151         self.on_advance_end(data_fetcher)
    152     except StopIteration:

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/training_epoch_loop.py:320](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/training_epoch_loop.py#line=319), in _TrainingEpochLoop.advance(self, data_fetcher)
    317 with trainer.profiler.profile("run_training_batch"):
    318     if trainer.lightning_module.automatic_optimization:
    319         # in automatic optimization, there can only be one optimizer
--> 320         batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
    321     else:
    322         batch_output = self.manual_optimization.run(kwargs)

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py:192](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py#line=191), in _AutomaticOptimization.run(self, optimizer, batch_idx, kwargs)
    185         closure()
    187 # ------------------------------
    188 # BACKWARD PASS
    189 # ------------------------------
    190 # gradient update with accumulated gradients
    191 else:
--> 192     self._optimizer_step(batch_idx, closure)
    194 result = closure.consume_result()
    195 if result.loss is None:

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py:270](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py#line=269), in _AutomaticOptimization._optimizer_step(self, batch_idx, train_step_and_backward_closure)
    267     self.optim_progress.optimizer.step.increment_ready()
    269 # model hook
--> 270 call._call_lightning_module_hook(
    271     trainer,
    272     "optimizer_step",
    273     trainer.current_epoch,
    274     batch_idx,
    275     optimizer,
    276     train_step_and_backward_closure,
    277 )
    279 if not should_accumulate:
    280     self.optim_progress.optimizer.step.increment_completed()

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py:176](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py#line=175), in _call_lightning_module_hook(trainer, hook_name, pl_module, *args, **kwargs)
    173 pl_module._current_fx_name = hook_name
    175 with trainer.profiler.profile(f"[LightningModule]{pl_module.__class__.__name__}.{hook_name}"):
--> 176     output = fn(*args, **kwargs)
    178 # restore current_fx when nested context
    179 pl_module._current_fx_name = prev_fx_name

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/core/module.py:1302](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/core/module.py#line=1301), in LightningModule.optimizer_step(self, epoch, batch_idx, optimizer, optimizer_closure)
   1271 def optimizer_step(
   1272     self,
   1273     epoch: int,
   (...)
   1276     optimizer_closure: Optional[Callable[[], Any]] = None,
   1277 ) -> None:
   1278     r"""Override this method to adjust the default way the :class:`~pytorch_lightning.trainer.trainer.Trainer` calls
   1279     the optimizer.
   1280 
   (...)
   1300 
   1301     """
-> 1302     optimizer.step(closure=optimizer_closure)

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/core/optimizer.py:154](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/core/optimizer.py#line=153), in LightningOptimizer.step(self, closure, **kwargs)
    151     raise MisconfigurationException("When `optimizer.step(closure)` is called, the closure should be callable")
    153 assert self._strategy is not None
--> 154 step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
    156 self._on_after_step()
    158 return step_output

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/strategies/strategy.py:239](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/strategies/strategy.py#line=238), in Strategy.optimizer_step(self, optimizer, closure, model, **kwargs)
    237 # TODO(fabric): remove assertion once strategy's optimizer_step typing is fixed
    238 assert isinstance(model, pl.LightningModule)
--> 239 return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/plugins/precision/precision.py:123](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/plugins/precision/precision.py#line=122), in Precision.optimizer_step(self, optimizer, model, closure, **kwargs)
    121 """Hook to run the optimizer step."""
    122 closure = partial(self._wrap_closure, model, optimizer, closure)
--> 123 return optimizer.step(closure=closure, **kwargs)

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/optimizer.py:493](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/optimizer.py#line=492), in Optimizer.profile_hook_step.<locals>.wrapper(*args, **kwargs)
    488         else:
    489             raise RuntimeError(
    490                 f"{func} must return None or a tuple of (new_args, new_kwargs), but got {result}."
    491             )
--> 493 out = func(*args, **kwargs)
    494 self._optimizer_step_code()
    496 # call optimizer step post hooks

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/optimizer.py:91](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/optimizer.py#line=90), in _use_grad_for_differentiable.<locals>._use_grad(self, *args, **kwargs)
     89     torch.set_grad_enabled(self.defaults["differentiable"])
     90     torch._dynamo.graph_break()
---> 91     ret = func(self, *args, **kwargs)
     92 finally:
     93     torch._dynamo.graph_break()

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/adam.py:223](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/adam.py#line=222), in Adam.step(self, closure)
    221 if closure is not None:
    222     with torch.enable_grad():
--> 223         loss = closure()
    225 for group in self.param_groups:
    226     params_with_grad: List[Tensor] = []

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/plugins/precision/precision.py:109](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/plugins/precision/precision.py#line=108), in Precision._wrap_closure(self, model, optimizer, closure)
     96 def _wrap_closure(
     97     self,
     98     model: "pl.LightningModule",
     99     optimizer: Steppable,
    100     closure: Callable[[], Any],
    101 ) -> Any:
    102     """This double-closure allows makes sure the ``closure`` is executed before the ``on_before_optimizer_step``
    103     hook is called.
    104 
   (...)
    107 
    108     """
--> 109     closure_result = closure()
    110     self._after_closure(model, optimizer)
    111     return closure_result

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py:146](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py#line=145), in Closure.__call__(self, *args, **kwargs)
    144 @override
    145 def __call__(self, *args: Any, **kwargs: Any) -> Optional[Tensor]:
--> 146     self._result = self.closure(*args, **kwargs)
    147     return self._result.loss

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/utils/_contextlib.py:116](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/utils/_contextlib.py#line=115), in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> 116         return func(*args, **kwargs)

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py:131](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py#line=130), in Closure.closure(self, *args, **kwargs)
    128 @override
    129 @torch.enable_grad()
    130 def closure(self, *args: Any, **kwargs: Any) -> ClosureResult:
--> 131     step_output = self._step_fn()
    133     if step_output.closure_loss is None:
    134         self.warning_cache.warn("`training_step` returned `None`. If this was on purpose, ignore this warning...")

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py:319](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py#line=318), in _AutomaticOptimization._training_step(self, kwargs)
    308 """Performs the actual train step with the tied hooks.
    309 
    310 Args:
   (...)
    315 
    316 """
    317 trainer = self.trainer
--> 319 training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
    320 self.trainer.strategy.post_training_step()  # unused hook - call anyway for backward compatibility
    322 if training_step_output is None and trainer.world_size > 1:

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py:328](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py#line=327), in _call_strategy_hook(trainer, hook_name, *args, **kwargs)
    325     return None
    327 with trainer.profiler.profile(f"[Strategy]{trainer.strategy.__class__.__name__}.{hook_name}"):
--> 328     output = fn(*args, **kwargs)
    330 # restore current_fx when nested context
    331 pl_module._current_fx_name = prev_fx_name

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/strategies/strategy.py:391](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/strategies/strategy.py#line=390), in Strategy.training_step(self, *args, **kwargs)
    389 if self.model != self.lightning_module:
    390     return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs)
--> 391 return self.lightning_module.training_step(*args, **kwargs)

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/dowhy/causal_prediction/algorithms/cacm.py:110](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/dowhy/causal_prediction/algorithms/cacm.py#line=109), in CACM.training_step(self, train_batch, batch_idx)
    108 # Acause regularization
    109 if attr_type == "causal":
--> 110     penalty_causal += self.CACMRegularizer.conditional_reg(
    111         classifs, [a[:, attr_type_idx] for a in attribute_labels], [targets], nmb, E_eq_A_attr
    112     )
    114 # Aconf regularization
    115 elif attr_type == "conf":

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/dowhy/causal_prediction/algorithms/regularization.py:209](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/dowhy/causal_prediction/algorithms/regularization.py#line=208), in Regularizer.conditional_reg(self, classifs, attribute_labels, conditioning_subset, num_envs, E_eq_A)
    207 cumprod = torch.cumprod(cardinality, dim=0)
    208 n_groups = cumprod[-1].item()
--> 209 factors_np = np.concatenate(([1], cumprod[:-1]))
    210 factors = torch.from_numpy(factors_np)
    211 group_indices = grouping_data @ factors

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/_tensor.py:1194](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/_tensor.py#line=1193), in Tensor.__array__(self, dtype)
   1192     return handle_torch_function(Tensor.__array__, (self,), self, dtype=dtype)
   1193 if dtype is None:
-> 1194     return self.numpy()
   1195 else:
   1196     return self.numpy().astype(dtype, copy=False)

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Steps to reproduce the behavior run all cells in dowhy_causal_prediction_demo.ipynb

Expected behavior The program runs normally and produces results.

Version information:

  • DoWhy: 0.12
  • pytorch_lightning: 2.5.1
  • torch: 2.6.0
  • python: 3.12.9

Additional context There was no issue when running the ERM in the previous cell.

JPZ4-5 avatar Mar 20 '25 10:03 JPZ4-5

Just provide a temporary solution.
Change lines 199-211 of dowhy/causal_prediction/algorithms/regularization.py to:

for i in range(num_envs):
    conditioning_subset_i = [subset_var[i] for subset_var in conditioning_subset]
    conditioning_subset_i_uniform = [
        ele.unsqueeze(1) if ele.dim() == 1 else ele for ele in conditioning_subset_i
    ]
    grouping_data = torch.cat(conditioning_subset_i_uniform, 1).to(device='cpu')
    assert grouping_data.min() >= 0, "Group numbers cannot be negative."
    cardinality = 1 + torch.max(grouping_data, dim=0)[0]
    cumprod = torch.cumprod(cardinality, dim=0)
    n_groups = cumprod[-1].item()
    factors_np = np.concatenate(([1], cumprod[:-1].cpu().numpy()))
    factors = torch.from_numpy(factors_np)
    group_indices = grouping_data @ factors

On my device, this works. The changes are .to(device='cpu') for grouping_data and cumprod[:-1].cpu().numpy().

To perform grouping_data @ factors, both variables should be on the same device, and on my CUDA device 2080s, it seems that tensors with dtype=torch.int64 on my CUDA device cannot perform the @ operation, and the error message shows "addmv_impl_cuda" not implemented for 'Long'.

Trying .to(device="cuda", dtype=torch.float64) seems to be slower than .to(device='cpu').

JPZ4-5 avatar Mar 21 '25 07:03 JPZ4-5

Fixed in #1341. It also appears to have fixed the problem in pl.Trainer where the number of devices could not be greater than 1.

JPZ4-5 avatar Aug 20 '25 10:08 JPZ4-5