yaml-cpp
yaml-cpp copied to clipboard
AddressSanitizer reports stack-use-after-scope
Environment
- CentOS 7
- yaml-cpp 0.6
- LLVM clang++ 8.0.1
MWE
hello:
- world
hi:
- world
#include <yaml-cpp/yaml.h>
#include <iostream>
int main() {
YAML::Node config = YAML::LoadFile("./demo.yaml");
for (auto iter = config.begin(); iter != config.end(); ++iter) {
auto& list = iter->second;
const size_t sz = list.size();
std::cout << list << std::endl;
}
return 0;
}
When build with -fsanitize=address,bounds
, binary aborts with the following error message:
=================================================================
==218037==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffd0b1c90b0 at pc 0x000000501b4f bp 0x7ffd0b1c8df0 sp 0x7ffd0b1c8de8
READ of size 1 at 0x7ffd0b1c90b0 thread T0
#0 0x501b4e in YAML::Node::size() const /tmp/third_party/yaml-cpp-0.6/include/yaml-cpp/node/impl.h:278:8
#1 0x501b4e in main /tmp/sandbox/liam/mwe/yaml-cpp-stack-use-after-scope/mwe.cc:9
#2 0x7fae5aff2554 in __libc_start_main (/lib64/libc.so.6+0x22554)
#3 0x422408 in _start (/home/liam/svn/ks/.build/diag_opt/targets/sandbox/liam/mwe/yaml-cpp-stack-use-after-scope/yaml-cpp-stack-use-after-scope+0x422408)
Address 0x7ffd0b1c90b0 is located in stack of thread T0 at offset 688 in frame
#0 0x5000df in main /tmp/sandbox/liam/mwe/yaml-cpp-stack-use-after-scope/mwe.cc:5
This frame has 10 object(s):
[32, 40) 'ref.tmp.i172'
[64, 76) 'ref.tmp.i164'
[96, 120) 'ref.tmp2.i'
[160, 256) 'ref.tmp.i'
[288, 320) 'agg.tmp.i40'
[352, 384) 'agg.tmp.i'
[416, 448) 'config' (line 6)
[480, 504) 'ref.tmp' (line 6)
[544, 592) 'iter' (line 7)
[624, 720) 'ref.tmp8' (line 8) <== Memory access at offset 688 is inside this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope /tmp/third_party/yaml-cpp-0.6/include/yaml-cpp/node/impl.h:278:8 in YAML::Node::size() const
Shadow bytes around the buggy address:
0x1000216311c0: f1 f1 f1 f1 f8 f2 f2 f2 f8 f8 f2 f2 f8 f8 f8 f2
0x1000216311d0: f2 f2 f2 f2 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
0x1000216311e0: f2 f2 f2 f2 f8 f8 f8 f8 f2 f2 f2 f2 f8 f8 f8 f8
0x1000216311f0: f2 f2 f2 f2 00 00 00 00 f2 f2 f2 f2 f8 f8 f8 f2
0x100021631200: f2 f2 f2 f2 00 00 00 00 00 00 f2 f2 f2 f2 f8 f8
=>0x100021631210: f8 f8 f8 f8 f8 f8[f8]f8 f8 f8 f3 f3 f3 f3 f3 f3
0x100021631220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100021631230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100021631240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100021631250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100021631260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==218037==ABORTING
Well, after diving into source of the project, it seems to be a bug of iterator.
When calling member access operator of iterator, it constructs a temporary object of proxy
. See:
https://github.com/jbeder/yaml-cpp/blob/98acc5a8874faab28b82c28936f4b400b389f5d6/include/yaml-cpp/node/detail/iterator.h#L87
However, constructing proxy
makes a copy of the object pointed by the iterator. See the constructor of proxy
.
https://github.com/jbeder/yaml-cpp/blob/98acc5a8874faab28b82c28936f4b400b389f5d6/include/yaml-cpp/node/detail/iterator.h#L31-L37
This leads to the fact that auto& list = iter->second;
refers to the member data of a temporary proxy
object. And after the very sentence is executed, the temporary proxy
object is destroyed. From then on, any use of list
might cause a stack-use-after-scope
error.
Hmmm, it looks like yaml-cpp itself has reference semantics.
Something looks like a copy, internally it works like a C++ reference.
auto list = iter->second;
list[0] = "Liam"; // <-- this will actually modify the original list.
Something looks like a reference, internally it causes a UB.
auto& list = iter->second;
list[0] = "Liam"; // <-- undefined behavior, stack-use-after-scope.
Would @jbeder kindly consider to fix this?
It is not possible current design, but would the author @jbeder kindly think of a new design?
@sunmy2019
What you've said does not change the fact that in yaml-cpp, copy works like a reference, but reference results in UB.
It is a compromise in the current design.
Why reference is UB here.
The proxy type exists because it needs to return a pointer to the dereference result, thus making iter->second
behaves alike to the (*iter).second
. Otherwise you need to write something like iter.operator->().second
.
But since the derefence result is not stored in the iterator itself, sturct proxy
is just a work around. It is hard to return a pointer and decide when to destroy that object. You need to choose between a possible dangling reference (incorrect use of user) or a memory leak.
The behavior is expected to be the same as (*iter).second
. But (*iter).second
is a rvalue, the compiler won't allow auto&
. That's not the case in the iter->second
, because the compiler thinks iter->second
as a lvalue.
C++ standards makes p->m
, the built-in member of pointer expression, a lvalue. So there's no way to forbid you binding it to reference.
So the behavior is correct when
V list = iter->second; // correct
V &list = iter->second; // wrong
V &&list = iter->second; // COMPILE TIME ERROR
V &&list = std::move(iter->second); // still wrong
auto list = iter -> second; // correct, list is V
auto &list = iter -> second; // wrong, list is V&
auto &&list = iter -> second; // wrong, list is V&
V list = (*iter).second; // correct
V &list = (*iter).second; // COMPILE TIME ERROR
V &&list = (*iter).second; // correct
auto list = (*iter).second; // correct, list is V
auto &list = (*iter).second; // COMPILE TIME ERROR
auto &&list = (*iter).second; // correct, list is V&&
Here are some possible methods:
- Make
detail::iterator_value
object long living or just simply leak the memory. - Remove
operator->()
. - Write into the document. Telling everyone to use
auto
.
Why copy works like a reference here.
- In current design, every change went through a
Node
.Node
is the public interface of the underlying memory. It is designed to become polymorphism. And the real type of the underlyding data is decided during running, which is unknown in the compile time. Soauto& list = iter->second
is impossible to bind to the correct type, which in your case isstd::string
.auto
as well. This could be done with some efforts. It will be something like
auto& list = iter->second.data_as_ref<std::string>();
It seems not that convenient to use.
- In the current design, you can get a copy by means of
auto list = iter->second.as<std::string>();
You cannot bind to it. It only allows you to copy, not write.
-
That's because
Node
is an interface to assign the value. It handles more than simply assign the data. -
Also your ref could be dangling because a memory reallocation is possible.
Future
A new design could be made with an object will generic type generic_data
. generic_data
implements all interface and stores a copy of the data. Then a expose to that generic_data
object would be the correct C++ semantics.
@sunmy2019 TL;DR.
You do not need to explain to me why we got UB as a result, 'cause I had already known, before your first reply. What I wonder is that the internal reason behind this weird design.
I would love to see this fixed since I always have to suppress ASAN/UBSAN warnings from yaml-cpp.
How do I suppress these ASAN errors? I could never get the suppression file to work. Can someone elaborate?
For me I see the error in this code in Node
impl by the way:
inline iterator Node::begin() {
if (!m_isValid)
return iterator();
return m_pNode ? iterator(m_pNode->begin(), m_pMemory) : iterator();
}
(version 0.6.2).
Is this not something to be concerned about, considering the issue is a year old?
@themightyoarfish I had to use a blacklist. This is how I do it https://github.com/Exawind/build-test/blob/9332a23324b2b2a98184f3be20a4f22b8db57939/test-scripts/test-nalu-wind.sh#L304
Then use -fsanitize-blacklist=blacklist.txt
or whatever.
Apparently still an issue with the current master.