kompute Add Tensor constructor that just require size instead of full data array

trafficstars

Currently Tensors have a constructor that allows to be created with full data array. This is sub-optimal when staging tensors or output tensors are created given that the data would be copied only to infer the size of the buffer, but the data would be replaced (that is unless the staging tensor is being used to copy from host to device). This issue encompasses adding a constructor or means to create tensors without data array.

Aug 29 '20 17:08 axsaucedo

So, I tried making this possible, and this is what I have done so far. Here is the constructor that I created:

Tensor::Tensor(int tensorSize, TensorTypes tensorType)
{
#if DEBUG
    SPDLOG_DEBUG("Kompute Tensor constructor data length: {}, and type: {}",
                 data.size(),
                 tensorType);
#endif

    std::vector<float> data_raw;
    data_raw.reserve(tensorSize);
    const std::vector<float>& data = data_raw;

    this->mData = data;
    this->mShape = { static_cast<uint32_t>(tensorSize) };
    this->mTensorType = tensorType;
}

I tackled the problem via reserving the data, so that it won't take memory like what was intended to be achieved. Also, as there are no floats inside the Tensor yet, this->mShape will take { static_cast<uint32_t>(tensorSize) }; because data.size() will yield 0. However, when I tested it with this, it resulted in segmentation fault, I traced this and it happens in:

    commandBuffer->copyBuffer(
      *copyFromTensor->mBuffer, *this->mBuffer, copyRegion);

(Tensors.cpp) I couldn't figure out why it resulted in segmentation fault, any ideas? Is reserve() not a viable solution?

Jan 23 '21 14:01 unexploredtest

I don't know how to solve this but here are some hints:

That line commandBuffer->copyBuffer is in the function Tensor::recordCopyFrom().
This function was called most likely from OpTensorCreate::record() trying to copy from a staging Tensor (host) to your Tensor (device)
The staging Tensor was created in OpTensorCreate::init():line47 with the parameter tensor->data() which refers to your Tensor.
tensor->data() returns your zero-sized std::vector which results in a zero-sized staging Tensor
The segfault is then probably because copyBuffer tries to copy buffers of different sizes

Having said that, I believe this issue is actually about avoiding this copyBuffer command completely, because host->device data transfer is slow.

Jan 23 '21 16:01 alexander-g

Yeah I see, thanks. I digged deeper with 'tensor->data()'. Though tensor->data() has a zero value, the method Tensor::memorySize() returns the output as desired, because we have actually given the mShape in the constructor, so for the most cases it works and (because Tensor::memorySize() is used when creating buffers). There is a place which might have caused the problem: (Tensor.cpp)Tensor::mapDataIntoHostMemory(), in memcpy(mapped, this->mData.data(), bufferSize);, maybe this caused the problem.

Having said that, I believe this issue is actually about avoiding this copyBuffer command completely, because host->device data transfer is slow.`

👍

Jan 23 '21 17:01 unexploredtest

@axsaucedo I've taken a short look at this and have some questions:

What's the point of staging tensors? Especially, why give the user the option to create one manually? What would be the use case?
How to use eStorage tensors? It seems they are currently unusable. I cannot use OpTensorCreate with them because this op tries to call mapDataIntoHostMemory() on them which is only valid for staging tensors. I also cannot manually call init() because I cannot access the Manager's vk::PhysicalDevice and vk::Device which are private.

So my suggestion would be to have only one type of tensor, which automatically creates a buffer on the device. If the user wants to transfer data to/from the device, a host-side vulkan buffer could be created on the fly and deleted after the transfer. Otherwise it acts as device-only memory. No need to keep a duplicate of the data on the host, let the user manage host-side memory themself. This would also remove the need for #21

Feb 05 '21 08:02 alexander-g

Hmm you raise a good point, you are right I think currently the eStorage tensors are unusable as the op tries to call mapDataIntoHostMemory. Let me have a look at how this can be addressed. Storage tensors may still be useful if the users want to make available memory to use within the GPU that would only be used to store data as any processing is being computed but it's not expected to be used to be sent back to the host. I have opened #132 so we can explore further.

In regards to your second point, it does still make sense for some situations where people may want to use a hostVisible tensor that would allow for memory to be accessible in the CPU without the need of an operation to copy into GPUOnlyVisible memory. I saw your PR that addressees the initial issue of data copy, so I will provide further thougths there, and we can continue the discussion around addressing the current functionality around Tensors in the issue #132.

Feb 06 '21 16:02 axsaucedo

kompute kompute copied to clipboard

Add Tensor constructor that just require size instead of full data array

kompute
kompute copied to clipboard