cppflow icon indicating copy to clipboard operation
cppflow copied to clipboard

import span to speed up

Open ivanallen opened this issue 4 years ago • 26 comments

Data copy is bad for performance. I import span(c++20) for speeding up data access.

ivanallen avatar Nov 04 '20 03:11 ivanallen

https://en.cppreference.com/w/cpp/language/extending_std It is undefined behavior to add declarations or definitions to namespace std or to any namespace nested within std, with a few exceptions noted below

I don't think defining std::span satisfies any of the exceptions. (Please point it out if I am wrong here.) It may cause name conflict with some compilers.

With that being said, I do agree to avoid data copy to improve performance. One way is to define span in another namespace, e.g. cppflow::span and uses std::span (typedef) when possible. The other way would be defining access functions on cppflow::tensor, e.g. operator[] and iterators.

ljn917 avatar Nov 04 '20 04:11 ljn917

Yes, you are right. std::span defined in c++20. Maybe we can use the standard span if using compiler supported c++20.

ivanallen avatar Nov 04 '20 05:11 ivanallen

https://en.cppreference.com/w/cpp/utility/feature_test

https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html

Maybe check __cpp_lib_span, and use std::span if it is available, otherwise use the self defined version.

On Wed, Nov 4, 2020, 00:33 Allen [email protected] wrote:

Yes, you are right. std::span defined in c++20. Maybe we can use the standard span if using compiler supported c++20.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/serizba/cppflow/pull/68#issuecomment-721521263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDTHSCRT3UCRFAXPJQ7D3SODRSRANCNFSM4TJQ7XVA .

ljn917 avatar Nov 04 '20 06:11 ljn917

https://en.cppreference.com/w/cpp/utility/feature_test https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html Maybe check __cpp_lib_span, and use std::span if it is available, otherwise use the self defined version. On Wed, Nov 4, 2020, 00:33 Allen @.***> wrote: Yes, you are right. std::span defined in c++20. Maybe we can use the standard span if using compiler supported c++20. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#68 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDTHSCRT3UCRFAXPJQ7D3SODRSRANCNFSM4TJQ7XVA .

Okay, I have updated my code to support c++20 feature :)

ivanallen avatar Nov 05 '20 02:11 ivanallen

Add test for span in c++17 and c++20 and test passed.

ivanallen avatar Nov 05 '20 07:11 ivanallen

You're really serious. LOL. I got an `std::runtime_error' undefined when i remove . Fixed it by the way.

ivanallen avatar Nov 05 '20 07:11 ivanallen

You're really serious. LOL. I got an `std::runtime_error' undefined when i remove . Fixed it by the way.

LOL. Thanks for the fix BTW. I am really looking forward to using this lib in my code, so I just want to make it pretty and clean. Thanks for bearing with me. We still need to look at the behavior of res_tensor though...

ljn917 avatar Nov 05 '20 07:11 ljn917

I guess I need to go through the source code to see how TF manages memory internally. My first speculation is that TF_Tensor is RefCounted, so we should be safe to delete it after obtaining an eager handle. (pure speculation)

On Thu, Nov 5, 2020, 2:52 AM Allen [email protected] wrote:

@ivanallen commented on this pull request.

In include/cppflow/tensor.h https://github.com/serizba/cppflow/pull/68#discussion_r517849884:

@@ -205,7 +206,7 @@ namespace cppflow {

     // Convert to correct type
     const auto T_data = static_cast<T*>(raw_data);

Yeah, you are right! It indeed confuses people. Actually, res_tensor and this->tf_tensor hold the same buffer, but sometimes res_tensor uses a new buffer.

I think we should use the raw data of this->tf_tensor but not res_tensor.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/serizba/cppflow/pull/68#discussion_r517849884, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDTHXJAUCQZ6CGTWNBJ4DSOJKSHANCNFSM4TJQ7XVA .

ljn917 avatar Nov 05 '20 08:11 ljn917

I guess I need to go through the source code to see how TF manages memory internally. My first speculation is that TF_Tensor is RefCounted, so we should be safe to delete it after obtaining an eager handle. (pure speculation) On Thu, Nov 5, 2020, 2:52 AM Allen @.> wrote: @.* commented on this pull request. ------------------------------ In include/cppflow/tensor.h <#68 (comment)>: > @@ -205,7 +206,7 @@ namespace cppflow { // Convert to correct type const auto T_data = static_cast<T*>(raw_data); Yeah, you are right! It indeed confuses people. Actually, res_tensor and this->tf_tensor hold the same buffer, but sometimes res_tensor uses a new buffer. I think we should use the raw data of this->tf_tensor but not res_tensor. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#68 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDTHXJAUCQZ6CGTWNBJ4DSOJKSHANCNFSM4TJQ7XVA .

I looked tensorflow source:

TF_Tensor* TF_TensorFromTensor(const tensorflow::Tensor& src,
                               TF_Status* status) {
  if (src.dtype() == DT_RESOURCE) {
    // ...
    // This branch is unsafe.
    const string str = src.scalar<ResourceHandle>()().SerializeAsString();
    TF_Tensor* t = TF_AllocateTensor(TF_RESOURCE, {}, 0, str.size());
    std::memcpy(TF_TensorData(t), str.c_str(), str.size());
    return t;
  }
  if (src.dtype() != DT_STRING) {
    // This branch is safe.
    TensorBuffer* buf = TensorCApi::Buffer(src);
    buf->Ref();
    return new TF_Tensor{static_cast<TF_DataType>(src.dtype()), src.shape(),
                         buf};
  }

So the result is up to dtype.

ivanallen avatar Nov 05 '20 08:11 ivanallen

if (src.dtype() == DT_RESOURCE) {
    // ...
    // This branch is unsafe.
    const string str = src.scalar<ResourceHandle>()().SerializeAsString();
    TF_Tensor* t = TF_AllocateTensor(TF_RESOURCE, {}, 0, str.size());
    std::memcpy(TF_TensorData(t), str.c_str(), str.size());
    return t;
  }

Maybe it leads to a memory leak.

ivanallen avatar Nov 05 '20 08:11 ivanallen

if (src.dtype() == DT_RESOURCE) {
    // ...
    // This branch is unsafe.
    const string str = src.scalar<ResourceHandle>()().SerializeAsString();
    TF_Tensor* t = TF_AllocateTensor(TF_RESOURCE, {}, 0, str.size());
    std::memcpy(TF_TensorData(t), str.c_str(), str.size());
    return t;
  }

Maybe it leads to a memory leak.

Yes, I agree res_tensor leaks in our original code. Thanks for spotting it!

ljn917 avatar Nov 05 '20 12:11 ljn917

@ivanallen @serizba

I think it is safe to store the resolved TF_Tensor* res_tensor into this->tf_tensor in tensor::get_data(). So we can avoid the memory leak, and also make sure res_tensor won't be deleted while we are accessing the returned data through span.

What happens under the hood is the following. (1) When we call TFE_NewTensorHandle, a new copy of the source tensor is created and moved (std::move) into a new TensorHandle. This makes sure the source tensor can be deleted after a TensorHandle is obtained. Edit: The underlying data of a tensorflow::Tensor is managed by a reference counted buffer. Copying the tensor here simply copies the buffer pointer, so we may see the same address with TFE_TensorHandleResolve.

(2) When we call TFE_TensorHandleResolve, the underlying TensorHandle::Resolve() is invoked. Depending on the context, a new concrete tensor may be copied from a device or a local copy/mirror is returned. We can just store this tensor into this->tf_tensor. In the original code, we definitely need to delete res_tensor at the end of tensor::get_data() to avoid memory leak.

@ivanallen For TF_TensorFromTensor, I think the dtype() dependent behavior is due to their different data representations, please see here. I don't know DT_RESOURCE, but DT_STRING may contain pointers to heap allocated buffer. Other data type should be just plain memory buffer.

ljn917 avatar Nov 05 '20 13:11 ljn917

I use AddressSanitizer detected some memory leak and fixed it. The method get_data really has a memory leak. But we can directly use this->tf_tensor obtaining raw data. It should be safe and no memory leak.

ivanallen avatar Nov 06 '20 10:11 ivanallen

@ivanallen I noticed the leak messages with gcc's sanitizer before but did not get the time to trace it down. I think the bug fix should be in a separate PR for easier testing.

We cannot get data from this->tf_tensor, and calling TFE_TensorHandleResolve is a must. After TFE_TensorHandle is obtained, there is no guarantee that this->tf_tensor and this->tfe_handle share the same memory, e.g. the data can be placed on GPU and a new fetch is needed after some computations. All the eager ops are using this->tfe_handle. Please refer to my previous comment for more details. In addition, some of the constructor of tensor class (tensor::tensor(TFE_TensorHandle* handle)) does not even set a valid this->tf_tensor

What I meant was to change auto res_tensor = TFE_TensorHandleResolve(this->tfe_handle.get(), context::get_status()); to this->tf_tensor = TFE_TensorHandleResolve(this->tfe_handle.get(), context::get_status()); There should be no data corruption/leaking issue and all lifetime is managed correctly. However, we still need more testing on this.

ljn917 avatar Nov 06 '20 12:11 ljn917

You know, the get_data method's type is const-qualified, so modify this->tf_tensor seems unreasonable unless you redesign the method sematic to non-const or letthis->tf_tensor mutable.

ivanallen avatar Nov 07 '20 07:11 ivanallen

But we really don't have other choices to keep its ownership, and resolving the tensor does change the underlying data, though not the pointer itself.

An alternative is to return std::span<const T>.

On Sat, Nov 7, 2020, 02:09 Allen [email protected] wrote:

You know, the get_data method's type is const-qualified, so modify this->tf_tensor seems unreasonable unless you redesign the method sematic to non-const or letthis->tf_tensor mutable.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/serizba/cppflow/pull/68#issuecomment-723402793, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDTHUZTMRPW3EBKGV6BPDSOTXDFANCNFSM4TJQ7XVA .

ljn917 avatar Nov 07 '20 08:11 ljn917

@ivanallen Why is this closed? I think it is still worth discussing at least.

@serizba I would also like to know your opinion on this before I started to test it.

ljn917 avatar Nov 26 '20 03:11 ljn917

@ivanallen Why is this closed? I think it is still worth discussing at least.

@serizba I would also like to know your opinion on this before I started to test it.

sorry, I just update cppflow2 in my github. I will reopen this PR.

ivanallen avatar Nov 26 '20 04:11 ivanallen

@ivanallen @ljn917

Sorry for not attending this PR before.

Regarding std::span sounds like a good idea, but what about not c++20 compilers? For the res_tensor thing, I also think using a new res_tensor was a bad idea from my side, using this->tf_tensor seems more reasonable.

The PR is interesting, so no need for closing it :)

serizba avatar Nov 27 '20 15:11 serizba

@serizba

Regarding std::span sounds like a good idea, but what about not c++20 compilers?

We are adding a standard compatible span.h for pre-C++20 compilers.

For the res_tensor thing, I also think using a new res_tensor was a bad idea from my side, using this->tf_tensor seems more reasonable.

We were discussing converting this->tf_tensor to a private member which would serve as a local cache of the resolved tensor. Perhaps, we also need to wrap TFE_TensorHandleResolve in a get method.

ljn917 avatar Nov 28 '20 01:11 ljn917

@ivanallen Finally, we are ready to work on this PR :)

First, could you clean up this PR a bit? Include the following:

  1. Rebase onto the updated cppflow2 branch. This should address the change in include/cppflow/context.h and include/cppflow/tensor.h
  2. Remove changes to the following files as they are not relevant: examples/eager_op_multithread/CMakeLists.txt include/cppflow/model.h

After that, I may push some tests to your branch.

ljn917 avatar Dec 01 '20 20:12 ljn917

@ivanallen Finally, we are ready to work on this PR :)

First, could you clean up this PR a bit? Include the following:

  1. Rebase onto the updated cppflow2 branch. This should address the change in include/cppflow/context.h and include/cppflow/tensor.h
  2. Remove changes to the following files as they are not relevant: examples/eager_op_multithread/CMakeLists.txt include/cppflow/model.h

After that, I may push some tests to your branch.

Great! I have rebased my code.

ivanallen avatar Dec 02 '20 05:12 ivanallen

@ivanallen Thanks a lot, but I think we have more changes to make...

  1. The point (2) above has not been addressed, i.e. remove changes to the following files as they are not relevant: examples/eager_op_multithread/CMakeLists.txt include/cppflow/model.h Those changes can still be seen at https://github.com/serizba/cppflow/pull/68/files

  2. Make a directory named third_party in the include directory, and place the orignial Tristan Brindle's span implementation into this directory, i.e. the unmodified version. In this way, we can separate the third party libs and update them easier.

  3. Create a header named compat_span.h in the include directory, and put our switch logic here. Macros, e.g. TCB_SPAN_NAMESPACE_NAME should also be defined here.

  4. Define TCB_SPAN_NAMESPACE_NAME as cppflow::tcb to avoid namespace pollution, e.g. TCB_SPAN_NAMESPACE_NAME::byte.

  5. As we discussed before, __cpp_lib_span is only defined in <version> or <span> header. We need to use __has_include feature test macro.

  6. Use cppflow::span<const T> as the return type of tensor::get_data() function.

I believe you can also mark all the previous comments as resolved because most of them will be outdated.

ljn917 avatar Dec 04 '20 13:12 ljn917

The objective of this PR is to avoid copying the data when retrieving the data with get_data. But, isn't #202 a simpler way of achieving this, and thus avoiding also the inconveniences of std::vector and std::span. Besides, if you need a std::span later on, you can also use the obtained raw_data for this.

What do you think?

serizba avatar Sep 23 '22 16:09 serizba

In my opinion, they are just two different styles of doing it, ie. the C style and the C++ style. The efficiency/convenience of getting the raw data has improved a lot since this PR, so I don't think its priority is as high as before. In addition, the key issue of this PR is importing a third party std::span. To be honest, I am not a big fan of it. I would rather wait until C++20 becomes more accessible.

ljn917 avatar Sep 28 '22 03:09 ljn917

Yes, that's also my impression. Perhaps for the moment I will include this with a precompiler directive when the C++20 is enabled. And I won't include the span implementation.

serizba avatar Sep 29 '22 08:09 serizba