yaml-cpp icon indicating copy to clipboard operation
yaml-cpp copied to clipboard

AddressSanitizer reports stack-use-after-scope

Open Liam0205 opened this issue 4 years ago • 21 comments

Environment

  • CentOS 7
  • yaml-cpp 0.6
  • LLVM clang++ 8.0.1

MWE

hello:
  - world
hi:
  - world
#include <yaml-cpp/yaml.h>

#include <iostream>

int main() {
  YAML::Node config = YAML::LoadFile("./demo.yaml");
  for (auto iter = config.begin(); iter != config.end(); ++iter) {
    auto& list = iter->second;
    const size_t sz = list.size();
    std::cout << list << std::endl;
  }
  return 0;
}

When build with -fsanitize=address,bounds, binary aborts with the following error message:

=================================================================
==218037==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffd0b1c90b0 at pc 0x000000501b4f bp 0x7ffd0b1c8df0 sp 0x7ffd0b1c8de8
READ of size 1 at 0x7ffd0b1c90b0 thread T0                                                                          
    #0 0x501b4e in YAML::Node::size() const /tmp/third_party/yaml-cpp-0.6/include/yaml-cpp/node/impl.h:278:8        
    #1 0x501b4e in main /tmp/sandbox/liam/mwe/yaml-cpp-stack-use-after-scope/mwe.cc:9                                                                                                                                         
    #2 0x7fae5aff2554 in __libc_start_main (/lib64/libc.so.6+0x22554)
    #3 0x422408 in _start (/home/liam/svn/ks/.build/diag_opt/targets/sandbox/liam/mwe/yaml-cpp-stack-use-after-scope/yaml-cpp-stack-use-after-scope+0x422408)

Address 0x7ffd0b1c90b0 is located in stack of thread T0 at offset 688 in frame
    #0 0x5000df in main /tmp/sandbox/liam/mwe/yaml-cpp-stack-use-after-scope/mwe.cc:5

  This frame has 10 object(s):
    [32, 40) 'ref.tmp.i172'
    [64, 76) 'ref.tmp.i164'
    [96, 120) 'ref.tmp2.i'
    [160, 256) 'ref.tmp.i'
    [288, 320) 'agg.tmp.i40'
    [352, 384) 'agg.tmp.i'
    [416, 448) 'config' (line 6)
    [480, 504) 'ref.tmp' (line 6)
    [544, 592) 'iter' (line 7)
    [624, 720) 'ref.tmp8' (line 8) <== Memory access at offset 688 is inside this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope /tmp/third_party/yaml-cpp-0.6/include/yaml-cpp/node/impl.h:278:8 in YAML::Node::size() const
Shadow bytes around the buggy address:
  0x1000216311c0: f1 f1 f1 f1 f8 f2 f2 f2 f8 f8 f2 f2 f8 f8 f8 f2
  0x1000216311d0: f2 f2 f2 f2 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x1000216311e0: f2 f2 f2 f2 f8 f8 f8 f8 f2 f2 f2 f2 f8 f8 f8 f8
  0x1000216311f0: f2 f2 f2 f2 00 00 00 00 f2 f2 f2 f2 f8 f8 f8 f2
  0x100021631200: f2 f2 f2 f2 00 00 00 00 00 00 f2 f2 f2 f2 f8 f8
=>0x100021631210: f8 f8 f8 f8 f8 f8[f8]f8 f8 f8 f3 f3 f3 f3 f3 f3
  0x100021631220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100021631230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100021631240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100021631250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100021631260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==218037==ABORTING

Liam0205 avatar Jan 31 '21 03:01 Liam0205

Well, after diving into source of the project, it seems to be a bug of iterator.

When calling member access operator of iterator, it constructs a temporary object of proxy. See: https://github.com/jbeder/yaml-cpp/blob/98acc5a8874faab28b82c28936f4b400b389f5d6/include/yaml-cpp/node/detail/iterator.h#L87

However, constructing proxy makes a copy of the object pointed by the iterator. See the constructor of proxy. https://github.com/jbeder/yaml-cpp/blob/98acc5a8874faab28b82c28936f4b400b389f5d6/include/yaml-cpp/node/detail/iterator.h#L31-L37

This leads to the fact that auto& list = iter->second; refers to the member data of a temporary proxy object. And after the very sentence is executed, the temporary proxy object is destroyed. From then on, any use of list might cause a stack-use-after-scope error.

Liam0205 avatar Jan 31 '21 09:01 Liam0205

Hmmm, it looks like yaml-cpp itself has reference semantics.

Something looks like a copy, internally it works like a C++ reference.

auto list = iter->second;
list[0] = "Liam";  // <-- this will actually modify the original list.

Something looks like a reference, internally it causes a UB.

auto& list = iter->second;
list[0] = "Liam";  // <-- undefined behavior, stack-use-after-scope.

Would @jbeder kindly consider to fix this?

Liam0205 avatar Jan 31 '21 10:01 Liam0205

It is not possible current design, but would the author @jbeder kindly think of a new design?

sunmy2019 avatar Jan 31 '21 17:01 sunmy2019

@sunmy2019

What you've said does not change the fact that in yaml-cpp, copy works like a reference, but reference results in UB.

Liam0205 avatar Feb 01 '21 01:02 Liam0205

It is a compromise in the current design.

Why reference is UB here.

The proxy type exists because it needs to return a pointer to the dereference result, thus making iter->second behaves alike to the (*iter).second. Otherwise you need to write something like iter.operator->().second.

But since the derefence result is not stored in the iterator itself, sturct proxy is just a work around. It is hard to return a pointer and decide when to destroy that object. You need to choose between a possible dangling reference (incorrect use of user) or a memory leak.

The behavior is expected to be the same as (*iter).second. But (*iter).second is a rvalue, the compiler won't allow auto&. That's not the case in the iter->second, because the compiler thinks iter->second as a lvalue.

C++ standards makes p->m, the built-in member of pointer expression, a lvalue. So there's no way to forbid you binding it to reference.

So the behavior is correct when

V list = iter->second;   // correct
V &list = iter->second;  // wrong
V &&list = iter->second; // COMPILE TIME ERROR
V &&list = std::move(iter->second); // still wrong

auto list = iter -> second;   // correct, list is V
auto &list = iter -> second;  // wrong,   list is V&
auto &&list = iter -> second; // wrong,   list is V&

V list = (*iter).second;   // correct
V &list = (*iter).second;  // COMPILE TIME ERROR
V &&list = (*iter).second; // correct

auto list = (*iter).second;   // correct, list is V
auto &list = (*iter).second;  // COMPILE TIME ERROR
auto &&list = (*iter).second; // correct, list is V&&

Here are some possible methods:

  1. Make detail::iterator_value object long living or just simply leak the memory.
  2. Remove operator->().
  3. Write into the document. Telling everyone to use auto.

Why copy works like a reference here.

  1. In current design, every change went through a Node. Node is the public interface of the underlying memory. It is designed to become polymorphism. And the real type of the underlyding data is decided during running, which is unknown in the compile time. So auto& list = iter->second is impossible to bind to the correct type, which in your case is std::string. auto as well. This could be done with some efforts. It will be something like
auto& list = iter->second.data_as_ref<std::string>();

It seems not that convenient to use.

  1. In the current design, you can get a copy by means of
auto list = iter->second.as<std::string>();

You cannot bind to it. It only allows you to copy, not write.

  1. That's because Node is an interface to assign the value. It handles more than simply assign the data.

  2. Also your ref could be dangling because a memory reallocation is possible.

Future

A new design could be made with an object will generic type generic_data. generic_data implements all interface and stores a copy of the data. Then a expose to that generic_data object would be the correct C++ semantics.

sunmy2019 avatar Feb 01 '21 17:02 sunmy2019

@sunmy2019 TL;DR.

You do not need to explain to me why we got UB as a result, 'cause I had already known, before your first reply. What I wonder is that the internal reason behind this weird design.

Liam0205 avatar Feb 02 '21 03:02 Liam0205

I would love to see this fixed since I always have to suppress ASAN/UBSAN warnings from yaml-cpp.

jrood-nrel avatar Feb 12 '21 23:02 jrood-nrel

How do I suppress these ASAN errors? I could never get the suppression file to work. Can someone elaborate?

themightyoarfish avatar Dec 16 '21 22:12 themightyoarfish

For me I see the error in this code in Node impl by the way:

inline iterator Node::begin() {
  if (!m_isValid)
    return iterator();
  return m_pNode ? iterator(m_pNode->begin(), m_pMemory) : iterator();
}

(version 0.6.2).

Is this not something to be concerned about, considering the issue is a year old?

themightyoarfish avatar Dec 16 '21 22:12 themightyoarfish

@themightyoarfish I had to use a blacklist. This is how I do it https://github.com/Exawind/build-test/blob/9332a23324b2b2a98184f3be20a4f22b8db57939/test-scripts/test-nalu-wind.sh#L304

Then use -fsanitize-blacklist=blacklist.txt or whatever.

jrood-nrel avatar Dec 16 '21 22:12 jrood-nrel

Apparently still an issue with the current master.

WilliamTambellini avatar Jul 12 '22 16:07 WilliamTambellini