Remove `to_(device)` API for all classes.
Moving data between different devices always allocates new memory. In-place version to_(device) doesn't prevent allocation. Python and C++ have the same syntax, with the to(device) method, to release the memory used before transfer.
my_unitensor = my_unitensor.to(Device.cuda)
Besides, managed memory is allocated when the device is GPU. CPU and GPUs can access managed memory, so it may not need to allocate any memory when switching devices if all data is stored in the managed memory. If all data is stored in the managed memory, the remaining to(device) can become an "in-place" method.
This function can be used in cases like: my_unitensor.to_(Device.cuda).other_().yetanother_() I think it’s ok to keep this?
Instead, we can use
my_unitensor = my_unitensor.to(Device.cuda).other_().yetanother_()
In the current implementation, to_ must allocate memory if moving to different devices, so the code above is not slower. Providing in-place to_ may mislead the user that to_ doesn't allocate new memory.