while trying dowhy_causal_prediction_demo.ipynb: TypeError: can't convert cuda:0 device type tensor to numpy
Describe the bug
When trying to run dowhy_causal_prediction_demo.ipynb, I encountered with this error when running the cell
trainer = pl.Trainer(devices=1, max_epochs=5)
trainer.fit(algorithm, loaders['train_loaders'], loaders['val_loaders'])
:
You are using the plain ModelCheckpoint callback. Consider using LitModelCheckpoint which with seamless uploading to Model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2]
| Name | Type | Params | Mode
---------------------------------------------
0 | model | Sequential | 306 K | train
---------------------------------------------
306 K Trainable params
0 Non-trainable params
306 K Total params
1.226 Total estimated model params size (MB)
8 Modules in train mode
0 Modules in eval mode
Epoch 0: 0%
0/312 [00:00<?, ?it/s]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[15], line 3
1 trainer = pl.Trainer(devices=1, max_epochs=5)
----> 3 trainer.fit(algorithm, loaders['train_loaders'], loaders['val_loaders'])
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py:561](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py#line=560), in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
559 self.training = True
560 self.should_stop = False
--> 561 call._call_and_handle_interrupt(
562 self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
563 )
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py:48](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py#line=47), in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
46 if trainer.strategy.launcher is not None:
47 return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 48 return trainer_fn(*args, **kwargs)
50 except _TunerExitException:
51 _call_teardown_hook(trainer)
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py:599](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py#line=598), in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
592 download_model_from_registry(ckpt_path, self)
593 ckpt_path = self._checkpoint_connector._select_ckpt_path(
594 self.state.fn,
595 ckpt_path,
596 model_provided=True,
597 model_connected=self.lightning_module is not None,
598 )
--> 599 self._run(model, ckpt_path=ckpt_path)
601 assert self.state.stopped
602 self.training = False
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py:1012](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py#line=1011), in Trainer._run(self, model, ckpt_path)
1007 self._signal_connector.register_signal_handlers()
1009 # ----------------------------
1010 # RUN THE TRAINER
1011 # ----------------------------
-> 1012 results = self._run_stage()
1014 # ----------------------------
1015 # POST-Training CLEAN UP
1016 # ----------------------------
1017 log.debug(f"{self.__class__.__name__}: trainer tearing down")
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py:1056](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py#line=1055), in Trainer._run_stage(self)
1054 self._run_sanity_check()
1055 with torch.autograd.set_detect_anomaly(self._detect_anomaly):
-> 1056 self.fit_loop.run()
1057 return None
1058 raise RuntimeError(f"Unexpected state {self.state}")
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/fit_loop.py:216](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/fit_loop.py#line=215), in _FitLoop.run(self)
214 try:
215 self.on_advance_start()
--> 216 self.advance()
217 self.on_advance_end()
218 except StopIteration:
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/fit_loop.py:455](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/fit_loop.py#line=454), in _FitLoop.advance(self)
453 with self.trainer.profiler.profile("run_training_epoch"):
454 assert self._data_fetcher is not None
--> 455 self.epoch_loop.run(self._data_fetcher)
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/training_epoch_loop.py:150](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/training_epoch_loop.py#line=149), in _TrainingEpochLoop.run(self, data_fetcher)
148 while not self.done:
149 try:
--> 150 self.advance(data_fetcher)
151 self.on_advance_end(data_fetcher)
152 except StopIteration:
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/training_epoch_loop.py:320](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/training_epoch_loop.py#line=319), in _TrainingEpochLoop.advance(self, data_fetcher)
317 with trainer.profiler.profile("run_training_batch"):
318 if trainer.lightning_module.automatic_optimization:
319 # in automatic optimization, there can only be one optimizer
--> 320 batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
321 else:
322 batch_output = self.manual_optimization.run(kwargs)
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py:192](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py#line=191), in _AutomaticOptimization.run(self, optimizer, batch_idx, kwargs)
185 closure()
187 # ------------------------------
188 # BACKWARD PASS
189 # ------------------------------
190 # gradient update with accumulated gradients
191 else:
--> 192 self._optimizer_step(batch_idx, closure)
194 result = closure.consume_result()
195 if result.loss is None:
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py:270](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py#line=269), in _AutomaticOptimization._optimizer_step(self, batch_idx, train_step_and_backward_closure)
267 self.optim_progress.optimizer.step.increment_ready()
269 # model hook
--> 270 call._call_lightning_module_hook(
271 trainer,
272 "optimizer_step",
273 trainer.current_epoch,
274 batch_idx,
275 optimizer,
276 train_step_and_backward_closure,
277 )
279 if not should_accumulate:
280 self.optim_progress.optimizer.step.increment_completed()
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py:176](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py#line=175), in _call_lightning_module_hook(trainer, hook_name, pl_module, *args, **kwargs)
173 pl_module._current_fx_name = hook_name
175 with trainer.profiler.profile(f"[LightningModule]{pl_module.__class__.__name__}.{hook_name}"):
--> 176 output = fn(*args, **kwargs)
178 # restore current_fx when nested context
179 pl_module._current_fx_name = prev_fx_name
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/core/module.py:1302](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/core/module.py#line=1301), in LightningModule.optimizer_step(self, epoch, batch_idx, optimizer, optimizer_closure)
1271 def optimizer_step(
1272 self,
1273 epoch: int,
(...)
1276 optimizer_closure: Optional[Callable[[], Any]] = None,
1277 ) -> None:
1278 r"""Override this method to adjust the default way the :class:`~pytorch_lightning.trainer.trainer.Trainer` calls
1279 the optimizer.
1280
(...)
1300
1301 """
-> 1302 optimizer.step(closure=optimizer_closure)
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/core/optimizer.py:154](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/core/optimizer.py#line=153), in LightningOptimizer.step(self, closure, **kwargs)
151 raise MisconfigurationException("When `optimizer.step(closure)` is called, the closure should be callable")
153 assert self._strategy is not None
--> 154 step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
156 self._on_after_step()
158 return step_output
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/strategies/strategy.py:239](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/strategies/strategy.py#line=238), in Strategy.optimizer_step(self, optimizer, closure, model, **kwargs)
237 # TODO(fabric): remove assertion once strategy's optimizer_step typing is fixed
238 assert isinstance(model, pl.LightningModule)
--> 239 return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/plugins/precision/precision.py:123](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/plugins/precision/precision.py#line=122), in Precision.optimizer_step(self, optimizer, model, closure, **kwargs)
121 """Hook to run the optimizer step."""
122 closure = partial(self._wrap_closure, model, optimizer, closure)
--> 123 return optimizer.step(closure=closure, **kwargs)
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/optimizer.py:493](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/optimizer.py#line=492), in Optimizer.profile_hook_step.<locals>.wrapper(*args, **kwargs)
488 else:
489 raise RuntimeError(
490 f"{func} must return None or a tuple of (new_args, new_kwargs), but got {result}."
491 )
--> 493 out = func(*args, **kwargs)
494 self._optimizer_step_code()
496 # call optimizer step post hooks
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/optimizer.py:91](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/optimizer.py#line=90), in _use_grad_for_differentiable.<locals>._use_grad(self, *args, **kwargs)
89 torch.set_grad_enabled(self.defaults["differentiable"])
90 torch._dynamo.graph_break()
---> 91 ret = func(self, *args, **kwargs)
92 finally:
93 torch._dynamo.graph_break()
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/adam.py:223](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/adam.py#line=222), in Adam.step(self, closure)
221 if closure is not None:
222 with torch.enable_grad():
--> 223 loss = closure()
225 for group in self.param_groups:
226 params_with_grad: List[Tensor] = []
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/plugins/precision/precision.py:109](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/plugins/precision/precision.py#line=108), in Precision._wrap_closure(self, model, optimizer, closure)
96 def _wrap_closure(
97 self,
98 model: "pl.LightningModule",
99 optimizer: Steppable,
100 closure: Callable[[], Any],
101 ) -> Any:
102 """This double-closure allows makes sure the ``closure`` is executed before the ``on_before_optimizer_step``
103 hook is called.
104
(...)
107
108 """
--> 109 closure_result = closure()
110 self._after_closure(model, optimizer)
111 return closure_result
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py:146](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py#line=145), in Closure.__call__(self, *args, **kwargs)
144 @override
145 def __call__(self, *args: Any, **kwargs: Any) -> Optional[Tensor]:
--> 146 self._result = self.closure(*args, **kwargs)
147 return self._result.loss
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/utils/_contextlib.py:116](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/utils/_contextlib.py#line=115), in context_decorator.<locals>.decorate_context(*args, **kwargs)
113 @functools.wraps(func)
114 def decorate_context(*args, **kwargs):
115 with ctx_factory():
--> 116 return func(*args, **kwargs)
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py:131](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py#line=130), in Closure.closure(self, *args, **kwargs)
128 @override
129 @torch.enable_grad()
130 def closure(self, *args: Any, **kwargs: Any) -> ClosureResult:
--> 131 step_output = self._step_fn()
133 if step_output.closure_loss is None:
134 self.warning_cache.warn("`training_step` returned `None`. If this was on purpose, ignore this warning...")
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py:319](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py#line=318), in _AutomaticOptimization._training_step(self, kwargs)
308 """Performs the actual train step with the tied hooks.
309
310 Args:
(...)
315
316 """
317 trainer = self.trainer
--> 319 training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
320 self.trainer.strategy.post_training_step() # unused hook - call anyway for backward compatibility
322 if training_step_output is None and trainer.world_size > 1:
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py:328](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py#line=327), in _call_strategy_hook(trainer, hook_name, *args, **kwargs)
325 return None
327 with trainer.profiler.profile(f"[Strategy]{trainer.strategy.__class__.__name__}.{hook_name}"):
--> 328 output = fn(*args, **kwargs)
330 # restore current_fx when nested context
331 pl_module._current_fx_name = prev_fx_name
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/strategies/strategy.py:391](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/strategies/strategy.py#line=390), in Strategy.training_step(self, *args, **kwargs)
389 if self.model != self.lightning_module:
390 return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs)
--> 391 return self.lightning_module.training_step(*args, **kwargs)
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/dowhy/causal_prediction/algorithms/cacm.py:110](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/dowhy/causal_prediction/algorithms/cacm.py#line=109), in CACM.training_step(self, train_batch, batch_idx)
108 # Acause regularization
109 if attr_type == "causal":
--> 110 penalty_causal += self.CACMRegularizer.conditional_reg(
111 classifs, [a[:, attr_type_idx] for a in attribute_labels], [targets], nmb, E_eq_A_attr
112 )
114 # Aconf regularization
115 elif attr_type == "conf":
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/dowhy/causal_prediction/algorithms/regularization.py:209](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/dowhy/causal_prediction/algorithms/regularization.py#line=208), in Regularizer.conditional_reg(self, classifs, attribute_labels, conditioning_subset, num_envs, E_eq_A)
207 cumprod = torch.cumprod(cardinality, dim=0)
208 n_groups = cumprod[-1].item()
--> 209 factors_np = np.concatenate(([1], cumprod[:-1]))
210 factors = torch.from_numpy(factors_np)
211 group_indices = grouping_data @ factors
File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/_tensor.py:1194](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/_tensor.py#line=1193), in Tensor.__array__(self, dtype)
1192 return handle_torch_function(Tensor.__array__, (self,), self, dtype=dtype)
1193 if dtype is None:
-> 1194 return self.numpy()
1195 else:
1196 return self.numpy().astype(dtype, copy=False)
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
Steps to reproduce the behavior
run all cells in dowhy_causal_prediction_demo.ipynb
Expected behavior The program runs normally and produces results.
Version information:
- DoWhy: 0.12
- pytorch_lightning: 2.5.1
- torch: 2.6.0
- python: 3.12.9
Additional context There was no issue when running the ERM in the previous cell.
Just provide a temporary solution.
Change lines 199-211 of dowhy/causal_prediction/algorithms/regularization.py to:
for i in range(num_envs):
conditioning_subset_i = [subset_var[i] for subset_var in conditioning_subset]
conditioning_subset_i_uniform = [
ele.unsqueeze(1) if ele.dim() == 1 else ele for ele in conditioning_subset_i
]
grouping_data = torch.cat(conditioning_subset_i_uniform, 1).to(device='cpu')
assert grouping_data.min() >= 0, "Group numbers cannot be negative."
cardinality = 1 + torch.max(grouping_data, dim=0)[0]
cumprod = torch.cumprod(cardinality, dim=0)
n_groups = cumprod[-1].item()
factors_np = np.concatenate(([1], cumprod[:-1].cpu().numpy()))
factors = torch.from_numpy(factors_np)
group_indices = grouping_data @ factors
On my device, this works. The changes are .to(device='cpu') for grouping_data and cumprod[:-1].cpu().numpy().
To perform grouping_data @ factors, both variables should be on the same device, and on my CUDA device 2080s, it seems that tensors with dtype=torch.int64 on my CUDA device cannot perform the @ operation, and the error message shows "addmv_impl_cuda" not implemented for 'Long'.
Trying .to(device="cuda", dtype=torch.float64) seems to be slower than .to(device='cpu').
Fixed in #1341. It also appears to have fixed the problem in pl.Trainer where the number of devices could not be greater than 1.