TensorRT-LLM What is the best way to get a sub-tensor without data copy?

What is the best way to get a sub-tensor without data copy?

Open dongluw opened this issue 10 months ago • 0 comments

Before the attention operation the qkv tensors are implemented as one big tensor qkv, I would like to do some in-place operations for q and k only.

Currently what I do is following the code here query, key, value = split(qkv, [self.attention_hidden_size, kv_size, kv_size], dim=2)to split the tensor, but I saw in the comment that it actually requires memory copy The slice layer selects for each dimension a start location from within the input tensor, and copies elements to the output tensor using a stride of 1 across the input tensor. which is not ideal in my case.

Wondering is there a way to get the sub-tensor without data copy? or trt-llm will optimize away the unnecessary data copies?

May 03 '24 15:05 dongluw

TensorRT-LLM TensorRT-LLM copied to clipboard

What is the best way to get a sub-tensor without data copy?

TensorRT-LLM
TensorRT-LLM copied to clipboard