tch-rs
tch-rs copied to clipboard
Implement Clone for Tensor?
I am interested in the reasons for / against implementing Clone for tensor. The only existing discussion I'm aware of is part of #281.
To start, it would be nice to have Clone
implemented for Tensor
. I'd like to be able to easily clone structures that store tensors.
There is a C++ clone function that could be used. I see that this function is in Declarations-*.yaml
but it does not appear in the generated api (at least for tch-rs, it's in the api for ocaml-torch). ~~Even if the decision is not to implement Clone for Tensor it would be nice to expose this function as g_clone
or something.~~ [Edit: Oops I forgot about Tensor::copy
. Why is that done in rust with copy_
instead of with the C++ clone()
?]
There is possibly an issue with whether Tensor::clone
should have a gradient back to the source.
From the torch perspective you would expect a clone()
method to behave the same as it does in C++ and Python, which is to be differentiable with respect to the source.
From the rust side that might be unexpected: if I clone a parameter tensor then I don't want the new parameter tensor to have gradients back to the original. I'm not sure that detaching is the solution either. If you have
let x = Tensor::ones(&[0], ...).requires_grad_(true);
let y = 2 * x;
let z = y.clone();
then should z
be differentiable with respect to x
as though it were y
or simply be detached? From a rust perspective I'd kind of expect z
to be exactly like y
which means being differentiable with respect to x
.
But the more I think about it though the more I think it's fine for clone()
to be differentiable. In the above example, dy/dx = 2
and dy/dy = 1
and z
would behave the same when substituting it for y
: dz/dx = 2
, dz/dy = 1
, dz/dz = 1
. As for differentiably cloning parameter tensors I think it's unlikely that you would differentiate a function of the clone with respect to the original and if you do, the fact that C++/Pytorch clone is differentiable ought to be enough of a warning to consider that the tch version might be too.
Another risk with implementing Clone
is that the new variable isn't tracked by a VarStore
but I don't think that should be problem either. If you are calling clone()
manually then it should be expected that the clone won't be in a VarStore
. The risk would be deriving Clone
for a struct that stores Tensors and has those Tensors in a VarStore
except you can't do that because VarStore
doesn't implement Clone
.
Any other reasons for / against? At this point I am in favour of implementing Clone
for Tensor
using the differentiable C++ clone()
function.
Replying to https://github.com/LaurentMazare/tch-rs/issues/281#issuecomment-855287935 from @jerry73204
Another concern is that whether
.clone()
is shallow or deep affects the semantics of ownership. Currently, the self-modifying methods like.absolute_(&mut self)
expects unique ownership. The shallow copy shares the underling data, making the ownership unsound. If libtorch knows concurrent self-modifying operations, just like many other concurrent data structures likedashmap
, it should expect shared ownership.
I agree that clone()
should be a deep copy. One reason is that the python and c++ clone()
methods perform a deep copy so any tch-rs clone()
should do the same. Another reason is that, as a user, I think of tch::Tensor
as owning the data (even though that's not quite true) so I expect clone()
to copy the data. The tch-rs
API doesn't really advertise the fact that tch::Tensor
is essentially Rc<RefCell<Data>>
(on the C++ side), it's called Tensor
not TensorHandle
after all, so I think it would be surprising to experience a shallow copy on clone()
.
As for shallow_clone()
, are you sure it's unsound? I'm not an expert on rust's soundness rules or anything but in sound rust code you can have two non-mut Rc<RefCell<Data>>
pointing to the same data, modify the data with one, and then it will be modified when the other reads it. I assume that on the C++ side there is some sort of lock to prevent multiple concurrent modifications. If the C++ side is something like Arc<RefCell<Mutex<Data>>>
then as far as I know it ought to be fine. The RefCell
checks of only-one-&mut wouldn't be there in C++ (just the inner mutability) but since the tch-rs
API doesn't let you safely get a pointer to the underlying data, I don't think you can end up with multiple mut references.
The saying of soundness is that unique ownership &mut self
does not respect the fact that data is shared in shallow copy. It should be &self
instead even for self-modifying methods. It's not wrong about implementation, it's the API that does not respect the rule.
Data sharing is common libtorch in various ways, such as Tensor::view()
leads to two owned instances with shared data. It could be explicit like Arc<RefCell<_>>
, or be transparent and documented in terms of ownership. The thing is the ownership model of Tensor
must be clearly defined. Consider the cases:
-
Tensor
has unique ownership to data:.absolute_(&mut self)
hasmut
,.view(self, ...)
takes the ownership of type..shallow_clone()
becomes unsound and should be removed. -
Tensor
points to shared data:.absolute_(&self)
takes shared ownership..view(&self, ...)
takes reference and produce owned type. Deep.clone()
and.shallow_clone()
can be both defined.
Taking mutable pointer to data can be marked unsafe
. The practical use is to implement low level operations with CUDA. An example can be found in my tch-nms (link).