cppflow
cppflow copied to clipboard
import span to speed up
Data copy is bad for performance. I import span(c++20) for speeding up data access.
https://en.cppreference.com/w/cpp/language/extending_std
It is undefined behavior to add declarations or definitions to namespace std or to any namespace nested within std, with a few exceptions noted below
I don't think defining std::span
satisfies any of the exceptions. (Please point it out if I am wrong here.) It may cause name conflict with some compilers.
With that being said, I do agree to avoid data copy to improve performance. One way is to define span
in another namespace, e.g. cppflow::span
and uses std::span
(typedef) when possible. The other way would be defining access functions on cppflow::tensor
, e.g. operator[]
and iterators.
Yes, you are right. std::span defined in c++20. Maybe we can use the standard span if using compiler supported c++20.
https://en.cppreference.com/w/cpp/utility/feature_test
https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html
Maybe check __cpp_lib_span, and use std::span if it is available, otherwise use the self defined version.
On Wed, Nov 4, 2020, 00:33 Allen [email protected] wrote:
Yes, you are right. std::span defined in c++20. Maybe we can use the standard span if using compiler supported c++20.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/serizba/cppflow/pull/68#issuecomment-721521263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDTHSCRT3UCRFAXPJQ7D3SODRSRANCNFSM4TJQ7XVA .
https://en.cppreference.com/w/cpp/utility/feature_test https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html Maybe check __cpp_lib_span, and use std::span if it is available, otherwise use the self defined version. … On Wed, Nov 4, 2020, 00:33 Allen @.***> wrote: Yes, you are right. std::span defined in c++20. Maybe we can use the standard span if using compiler supported c++20. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#68 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDTHSCRT3UCRFAXPJQ7D3SODRSRANCNFSM4TJQ7XVA .
Okay, I have updated my code to support c++20 feature :)
Add test for span in c++17 and c++20 and test passed.
You're really serious. LOL. I got an `std::runtime_error' undefined when i remove
You're really serious. LOL. I got an `std::runtime_error' undefined when i remove . Fixed it by the way.
LOL. Thanks for the fix BTW. I am really looking forward to using this lib in my code, so I just want to make it pretty and clean. Thanks for bearing with me. We still need to look at the behavior of res_tensor
though...
I guess I need to go through the source code to see how TF manages memory internally. My first speculation is that TF_Tensor is RefCounted, so we should be safe to delete it after obtaining an eager handle. (pure speculation)
On Thu, Nov 5, 2020, 2:52 AM Allen [email protected] wrote:
@ivanallen commented on this pull request.
In include/cppflow/tensor.h https://github.com/serizba/cppflow/pull/68#discussion_r517849884:
@@ -205,7 +206,7 @@ namespace cppflow {
// Convert to correct type const auto T_data = static_cast<T*>(raw_data);
Yeah, you are right! It indeed confuses people. Actually, res_tensor and this->tf_tensor hold the same buffer, but sometimes res_tensor uses a new buffer.
I think we should use the raw data of this->tf_tensor but not res_tensor.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/serizba/cppflow/pull/68#discussion_r517849884, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDTHXJAUCQZ6CGTWNBJ4DSOJKSHANCNFSM4TJQ7XVA .
I guess I need to go through the source code to see how TF manages memory internally. My first speculation is that TF_Tensor is RefCounted, so we should be safe to delete it after obtaining an eager handle. (pure speculation) … On Thu, Nov 5, 2020, 2:52 AM Allen @.> wrote: @.* commented on this pull request. ------------------------------ In include/cppflow/tensor.h <#68 (comment)>: > @@ -205,7 +206,7 @@ namespace cppflow { // Convert to correct type const auto T_data = static_cast<T*>(raw_data); Yeah, you are right! It indeed confuses people. Actually, res_tensor and this->tf_tensor hold the same buffer, but sometimes res_tensor uses a new buffer. I think we should use the raw data of this->tf_tensor but not res_tensor. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#68 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDTHXJAUCQZ6CGTWNBJ4DSOJKSHANCNFSM4TJQ7XVA .
I looked tensorflow source:
TF_Tensor* TF_TensorFromTensor(const tensorflow::Tensor& src,
TF_Status* status) {
if (src.dtype() == DT_RESOURCE) {
// ...
// This branch is unsafe.
const string str = src.scalar<ResourceHandle>()().SerializeAsString();
TF_Tensor* t = TF_AllocateTensor(TF_RESOURCE, {}, 0, str.size());
std::memcpy(TF_TensorData(t), str.c_str(), str.size());
return t;
}
if (src.dtype() != DT_STRING) {
// This branch is safe.
TensorBuffer* buf = TensorCApi::Buffer(src);
buf->Ref();
return new TF_Tensor{static_cast<TF_DataType>(src.dtype()), src.shape(),
buf};
}
So the result is up to dtype.
if (src.dtype() == DT_RESOURCE) {
// ...
// This branch is unsafe.
const string str = src.scalar<ResourceHandle>()().SerializeAsString();
TF_Tensor* t = TF_AllocateTensor(TF_RESOURCE, {}, 0, str.size());
std::memcpy(TF_TensorData(t), str.c_str(), str.size());
return t;
}
Maybe it leads to a memory leak.
if (src.dtype() == DT_RESOURCE) { // ... // This branch is unsafe. const string str = src.scalar<ResourceHandle>()().SerializeAsString(); TF_Tensor* t = TF_AllocateTensor(TF_RESOURCE, {}, 0, str.size()); std::memcpy(TF_TensorData(t), str.c_str(), str.size()); return t; }
Maybe it leads to a memory leak.
Yes, I agree res_tensor
leaks in our original code. Thanks for spotting it!
@ivanallen @serizba
I think it is safe to store the resolved TF_Tensor* res_tensor
into this->tf_tensor
in tensor::get_data()
. So we can avoid the memory leak, and also make sure res_tensor
won't be deleted while we are accessing the returned data through span.
What happens under the hood is the following.
(1) When we call TFE_NewTensorHandle
, a new copy of the source tensor is created and moved (std::move) into a new TensorHandle
. This makes sure the source tensor can be deleted after a TensorHandle
is obtained.
Edit: The underlying data of a tensorflow::Tensor
is managed by a reference counted buffer. Copying the tensor here simply copies the buffer pointer, so we may see the same address with TFE_TensorHandleResolve
.
(2) When we call TFE_TensorHandleResolve
, the underlying TensorHandle::Resolve()
is invoked. Depending on the context, a new concrete tensor may be copied from a device or a local copy/mirror is returned. We can just store this tensor into this->tf_tensor
. In the original code, we definitely need to delete res_tensor
at the end of tensor::get_data()
to avoid memory leak.
@ivanallen For TF_TensorFromTensor
, I think the dtype()
dependent behavior is due to their different data representations, please see here. I don't know DT_RESOURCE
, but DT_STRING
may contain pointers to heap allocated buffer. Other data type should be just plain memory buffer.
I use AddressSanitizer detected some memory leak and fixed it. The method get_data really has a memory leak. But we can directly use this->tf_tensor obtaining raw data. It should be safe and no memory leak.
@ivanallen I noticed the leak messages with gcc's sanitizer before but did not get the time to trace it down. I think the bug fix should be in a separate PR for easier testing.
We cannot get data from this->tf_tensor
, and calling TFE_TensorHandleResolve
is a must. After TFE_TensorHandle
is obtained, there is no guarantee that this->tf_tensor
and this->tfe_handle
share the same memory, e.g. the data can be placed on GPU and a new fetch is needed after some computations. All the eager ops are using this->tfe_handle
. Please refer to my previous comment for more details. In addition, some of the constructor of tensor
class (tensor::tensor(TFE_TensorHandle* handle)
) does not even set a valid this->tf_tensor
What I meant was to change
auto res_tensor = TFE_TensorHandleResolve(this->tfe_handle.get(), context::get_status());
to
this->tf_tensor = TFE_TensorHandleResolve(this->tfe_handle.get(), context::get_status());
There should be no data corruption/leaking issue and all lifetime is managed correctly. However, we still need more testing on this.
You know, the get_data
method's type is const-qualified, so modify this->tf_tensor seems unreasonable unless you redesign the method sematic to non-const or letthis->tf_tensor
mutable.
But we really don't have other choices to keep its ownership, and resolving the tensor does change the underlying data, though not the pointer itself.
An alternative is to return std::span<const T>.
On Sat, Nov 7, 2020, 02:09 Allen [email protected] wrote:
You know, the get_data method's type is const-qualified, so modify this->tf_tensor seems unreasonable unless you redesign the method sematic to non-const or letthis->tf_tensor mutable.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/serizba/cppflow/pull/68#issuecomment-723402793, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDTHUZTMRPW3EBKGV6BPDSOTXDFANCNFSM4TJQ7XVA .
@ivanallen Why is this closed? I think it is still worth discussing at least.
@serizba I would also like to know your opinion on this before I started to test it.
@ivanallen Why is this closed? I think it is still worth discussing at least.
@serizba I would also like to know your opinion on this before I started to test it.
sorry, I just update cppflow2 in my github. I will reopen this PR.
@ivanallen @ljn917
Sorry for not attending this PR before.
Regarding std::span
sounds like a good idea, but what about not c++20 compilers?
For the res_tensor
thing, I also think using a new res_tensor
was a bad idea from my side, using this->tf_tensor
seems more reasonable.
The PR is interesting, so no need for closing it :)
@serizba
Regarding
std::span
sounds like a good idea, but what about not c++20 compilers?
We are adding a standard compatible span.h
for pre-C++20 compilers.
For the
res_tensor
thing, I also think using a newres_tensor
was a bad idea from my side, usingthis->tf_tensor
seems more reasonable.
We were discussing converting this->tf_tensor
to a private member which would serve as a local cache of the resolved tensor. Perhaps, we also need to wrap TFE_TensorHandleResolve
in a get method.
@ivanallen Finally, we are ready to work on this PR :)
First, could you clean up this PR a bit? Include the following:
- Rebase onto the updated
cppflow2
branch. This should address the change ininclude/cppflow/context.h
andinclude/cppflow/tensor.h
- Remove changes to the following files as they are not relevant:
examples/eager_op_multithread/CMakeLists.txt
include/cppflow/model.h
After that, I may push some tests to your branch.
@ivanallen Finally, we are ready to work on this PR :)
First, could you clean up this PR a bit? Include the following:
- Rebase onto the updated
cppflow2
branch. This should address the change ininclude/cppflow/context.h
andinclude/cppflow/tensor.h
- Remove changes to the following files as they are not relevant:
examples/eager_op_multithread/CMakeLists.txt
include/cppflow/model.h
After that, I may push some tests to your branch.
Great! I have rebased my code.
@ivanallen Thanks a lot, but I think we have more changes to make...
-
The point (2) above has not been addressed, i.e. remove changes to the following files as they are not relevant:
examples/eager_op_multithread/CMakeLists.txt
include/cppflow/model.h
Those changes can still be seen at https://github.com/serizba/cppflow/pull/68/files -
Make a directory named
third_party
in theinclude
directory, and place the orignial Tristan Brindle's span implementation into this directory, i.e. the unmodified version. In this way, we can separate the third party libs and update them easier. -
Create a header named
compat_span.h
in theinclude
directory, and put our switch logic here. Macros, e.g.TCB_SPAN_NAMESPACE_NAME
should also be defined here. -
Define
TCB_SPAN_NAMESPACE_NAME
ascppflow::tcb
to avoid namespace pollution, e.g.TCB_SPAN_NAMESPACE_NAME::byte
. -
As we discussed before,
__cpp_lib_span
is only defined in<version>
or<span>
header. We need to use__has_include
feature test macro. -
Use
cppflow::span<const T>
as the return type oftensor::get_data()
function.
I believe you can also mark all the previous comments as resolved because most of them will be outdated.
The objective of this PR is to avoid copying the data when retrieving the data with get_data. But, isn't #202 a simpler way of achieving this, and thus avoiding also the inconveniences of std::vector and std::span. Besides, if you need a std::span later on, you can also use the obtained raw_data for this.
What do you think?
In my opinion, they are just two different styles of doing it, ie. the C style and the C++ style. The efficiency/convenience of getting the raw data has improved a lot since this PR, so I don't think its priority is as high as before. In addition, the key issue of this PR is importing a third party std::span
. To be honest, I am not a big fan of it. I would rather wait until C++20 becomes more accessible.
Yes, that's also my impression. Perhaps for the moment I will include this with a precompiler directive when the C++20 is enabled. And I won't include the span implementation.