kompute
kompute copied to clipboard
Add Tensor constructor that just require size instead of full data array
Currently Tensors have a constructor that allows to be created with full data array. This is sub-optimal when staging tensors or output tensors are created given that the data would be copied only to infer the size of the buffer, but the data would be replaced (that is unless the staging tensor is being used to copy from host to device). This issue encompasses adding a constructor or means to create tensors without data array.
So, I tried making this possible, and this is what I have done so far. Here is the constructor that I created:
Tensor::Tensor(int tensorSize, TensorTypes tensorType)
{
#if DEBUG
SPDLOG_DEBUG("Kompute Tensor constructor data length: {}, and type: {}",
data.size(),
tensorType);
#endif
std::vector<float> data_raw;
data_raw.reserve(tensorSize);
const std::vector<float>& data = data_raw;
this->mData = data;
this->mShape = { static_cast<uint32_t>(tensorSize) };
this->mTensorType = tensorType;
}
I tackled the problem via reserving the data, so that it won't take memory like what was intended to be achieved. Also, as there are no floats inside the Tensor yet, this->mShape will take { static_cast<uint32_t>(tensorSize) }; because data.size() will yield 0. However, when I tested it with this, it resulted in segmentation fault, I traced this and it happens in:
commandBuffer->copyBuffer(
*copyFromTensor->mBuffer, *this->mBuffer, copyRegion);
(Tensors.cpp)
I couldn't figure out why it resulted in segmentation fault, any ideas? Is reserve() not a viable solution?
I don't know how to solve this but here are some hints:
- That line
commandBuffer->copyBufferis in the functionTensor::recordCopyFrom(). - This function was called most likely from
OpTensorCreate::record()trying to copy from a staging Tensor (host) to your Tensor (device) - The staging Tensor was created in
OpTensorCreate::init():line47with the parametertensor->data()which refers to your Tensor. tensor->data()returns your zero-sizedstd::vectorwhich results in a zero-sized staging Tensor- The segfault is then probably because
copyBuffertries to copy buffers of different sizes
Having said that, I believe this issue is actually about avoiding this copyBuffer command completely, because host->device data transfer is slow.
Yeah I see, thanks. I digged deeper with 'tensor->data()'. Though tensor->data() has a zero value, the method Tensor::memorySize() returns the output as desired, because we have actually given the mShape in the constructor, so for the most cases it works and (because Tensor::memorySize() is used when creating buffers). There is a place which might have caused the problem: (Tensor.cpp)Tensor::mapDataIntoHostMemory(), in memcpy(mapped, this->mData.data(), bufferSize);, maybe this caused the problem.
Having said that, I believe this issue is actually about avoiding this copyBuffer command completely, because host->device data transfer is slow.`
👍
@axsaucedo I've taken a short look at this and have some questions:
- What's the point of staging tensors? Especially, why give the user the option to create one manually? What would be the use case?
- How to use
eStoragetensors? It seems they are currently unusable. I cannot useOpTensorCreatewith them because this op tries to callmapDataIntoHostMemory()on them which is only valid for staging tensors. I also cannot manually callinit()because I cannot access the Manager'svk::PhysicalDeviceandvk::Devicewhich are private.
So my suggestion would be to have only one type of tensor, which automatically creates a buffer on the device. If the user wants to transfer data to/from the device, a host-side vulkan buffer could be created on the fly and deleted after the transfer. Otherwise it acts as device-only memory. No need to keep a duplicate of the data on the host, let the user manage host-side memory themself. This would also remove the need for #21
Hmm you raise a good point, you are right I think currently the eStorage tensors are unusable as the op tries to call mapDataIntoHostMemory. Let me have a look at how this can be addressed. Storage tensors may still be useful if the users want to make available memory to use within the GPU that would only be used to store data as any processing is being computed but it's not expected to be used to be sent back to the host. I have opened #132 so we can explore further.
In regards to your second point, it does still make sense for some situations where people may want to use a hostVisible tensor that would allow for memory to be accessible in the CPU without the need of an operation to copy into GPUOnlyVisible memory. I saw your PR that addressees the initial issue of data copy, so I will provide further thougths there, and we can continue the discussion around addressing the current functionality around Tensors in the issue #132.