ATen
ATen copied to clipboard
Tensor factories (and functions) should accept Tensors/Scalars as arguments to reduce sync points (?)
Currently if you want to create a new Tensor from specifications residing on gpu a sync is needed (imho): It would be greate if sthg like the following could work:
... my_func(...){
...
my_gpu_tensor = ...; // size N x 1
//now we want to create a new M x 1 tensor where M = my_gpu_tensor[N][0]
auto new_size = Scalar(my_gpu_tensor[N][0]);
auto new_tensor = my_gpu_tensor.type({new_size}).
//or a Tensor of size my_gpu_tensor.slice(0, N-2)
auto new_tensor_2 = my_gpu_tensor.type(my_gpu_tensor.slice(0, N-2).squeeze());
...
}
Currently I have to do first sthg like new_size = new_size.to<int>()
to make it work.
But from my understanding this introduces a device->host action.
Hence, it interrupts the asynchronous nature of the gpu calls and it prevents me from calling
my_func asynchronously on several streams and then waiting together.
As I am not deep enough in the ATen sources, is it technically possible with reasonable effort to make this work? Or is there actually a way to do this I did not see?
regards c.hofer