FTorch icon indicating copy to clipboard operation
FTorch copied to clipboard

Error when calling `torch_tensor_from_array` on GPU

Open TomMelt opened this issue 8 months ago • 1 comments

A new error has been introduced in commit 864b0ebaada. When calling torch_tensor_from_array on GPU e.g.,

call torch_tensor_from_array(input_tensor(1), input_data_2d, layout_2d, torch_kCUDA)

We get the following error:

[ERROR]: The specified pointer resides on host memory and is not registered with any CUDA device.

It appears it is linked to the following lines in src/ctorch.cpp: https://github.com/Cambridge-ICCS/FTorch/blob/df8cfd634de6f2792a59531c97bb5c72ea3deed7/src/ctorch.cpp#L275-L279

It will most likely also affect torch_empty, torch_zeros and torch_ones as well.

TomMelt avatar Apr 24 '25 14:04 TomMelt

Thanks for raising this @TomMelt, and even more thanks for tracking down the commit and lines!!

Do you have any idea as to what the cause/fix might be? I see that that commit started using tensorOptions rather than a chained call to .to(). I have not been able to look too deeply, but came across this in the C++ docs: https://github.com/pytorch/pytorch/blob/main/c10/core/TensorOptions.h#L100-L127 which may or may not be relevant.

I guess a quick fix might be to revert back to using .to()(?), though it would be good to understand why the device from the options is not working.

jatkinson1000 avatar Apr 29 '25 16:04 jatkinson1000

I think the problems lies in that torch::from_blob(data, vshape, vstrides, options) expects data to already reside on the device passed as part of options. (See here).

Thus we cannot avoid calling to(get_libtorch_device(device_type, device_index))

The last working-version was: https://github.com/Cambridge-ICCS/FTorch/blob/6deab9a58421aaebcb42d50127f8906128314cdd/src/ctorch.cpp#L229-L231

niccolozanotti avatar May 21 '25 14:05 niccolozanotti