CppCoreGuidelines icon indicating copy to clipboard operation
CppCoreGuidelines copied to clipboard

std::string_view encourages use-after-free; the Core Guidelines Checker doesn't complain

Open kcc opened this issue 6 years ago • 31 comments

[please excuse me if this has been discussed before]

#include <iostream>
#include <string>
#include <string_view>

int main() {
  std::string s = "Hellooooooooooooooo ";
  std::string_view sv = s + "World\n";
  std::cout << sv;
}

Here we have a heap-use-after-free bug which is easy to spot if you know what to look for, but the Core Guidelines Checker in VS++17 is silent (confirmed by @AndrewPardoe). This might be something missing in the checker, but I suspect that this is actually missing in the guidelines themselves. Moreover, I don't see how we can reject code like this w/o rejecting lots of other safe C++17 code.

Thoughts?

See also: https://bugs.llvm.org/show_bug.cgi?id=34729

kcc avatar Sep 26 '17 16:09 kcc

It's been discussed. The obvious solution of deleting string_view(string&&) breaks the primary use of string_view as a function argument.

@kcc, you primarily work in an enormous codebase that calls this type StringPiece. Do you feel like its existence is a good tradeoff there?

jyasskin avatar Sep 26 '17 17:09 jyasskin

See also: https://bugs.llvm.org/show_bug.cgi?id=34729

This is a feature, not a bug.

However, I've just skimmed the guidelines and don't see anything specifically about avoiding dangling views. It should be advised against, even if it's made possible by the API and can't be enforced by the checkers.

jwakely avatar Sep 26 '17 17:09 jwakely

Supposedly, the lifetime profile should catch any cases of dangling pointers (wasn't that the claim at cppcon 15?). So msvc should be able to catch them, but I don't remember if that function ever left the beta stage.

MikeGitb avatar Sep 26 '17 18:09 MikeGitb

calls this type StringPiece

It's been renamed to absl::string_view. We've being finding many misuses of this class (causing heap-use-after-free and stack-use-after-return) since 2011. My POV is heavily biased due to countless hours we've spent debugging those, even with ASAN. IMHO string_view is a disaster. Of course, removing this class does not solve the problem of the language and I was hoping that at least the core guidelines would forbid this type of code (not just string_view).

can't be enforced by the checkers.

If a guideline can't be enforced, does it still have any value?

kcc avatar Sep 27 '17 04:09 kcc

@kcc Have you done anything to get it removed from the codebases you've found the misuses in? What blocked you? Remember, I asked if it was a good tradeoff, not if it had no issues.

jyasskin avatar Sep 27 '17 04:09 jyasskin

I haven't -- it's used everywhere, and when used carefully it brings lots of performance benefits. So, let me correct myself: it's a stability/security disaster and a performance win. tradeoffs are hard here.

kcc avatar Sep 27 '17 04:09 kcc

And again, my complaint is not so much about std::string_view, but more about the core guidelines checker not reporting this and similar code as a problem (non-core).

kcc avatar Sep 27 '17 04:09 kcc

Out of interest: could we find patterns of the misuse? Clang-tidy could catch the string_view sv = s + "string";. So even if we can't find all for now, we could find some, maybe even most cases.

JonasToth avatar Sep 27 '17 04:09 JonasToth

In the range-v3 library it is explicitly disallowed to create a range (which is a view) from an rvalue container. Wondering if something similar could be done for string_view. https://ericniebler.github.io/std/wg21/D4128.html#ranges-cannot-own-elements

tamaskenez avatar Sep 27 '17 07:09 tamaskenez

@tamaskenez: As @jyasskin said, that would eliminate a few useful use cases when it is used as a function parameter. You also can't enforce this in string_view's api itself, as the conversion string-> view is part of std::string's interface.

I'm not saying that it might not be worth to make this trade-off, but Imho this is not a guideline/interface issue, but a tool issue. For all intents and purposes std::string_view should be treated like a glorified raw pointer or iterator (except when used as a parameter). And imho a proper static analysis tool (including the compiler) should be able to see that I'm creating a pointer to a temporary here.

MikeGitb avatar Sep 27 '17 07:09 MikeGitb

a proper static analysis tool

Are you aware of any such tool? (and to make things more realistic, it should catch a general case, not just the patter where the literal std::string_view is used.

kcc avatar Sep 27 '17 14:09 kcc

See https://bugs.llvm.org/show_bug.cgi?id=34500

The problematic cases should be caught by static analyzers, but are not, because the tools are buggy.

villevoutilainen avatar Sep 27 '17 16:09 villevoutilainen

@kcc so if I understand correctly, you are just complaining about tool implementation, rather than the guideline itself? Why would you not raise an issue directly with your tool vendor?

neilmacintosh avatar Sep 27 '17 16:09 neilmacintosh

@kcc: Sorry, I expressed myself poorly. What I meant was std::string_view should be treated like a glorified pointer (potentially by special logic) and IF that is the case, then any tool should be able to detect that you are creating a std::string_view (aka pointer) to a temporary (not saying any tool is actually doing it).

Apparently this is not where we are, but it wouldn't require black magic to do so. This is why I call it a tool issue.

Now, my understanding of the demos @neilmacintosh gave at cppcon 2015 during Herb's talk (Neil, please correct me if I'm wrong) was that the goal for Microsoft's analyser was to do even better and actually detect that string_view has an internal pointer, that points to data, whose lifetime is coupled to that of the temporary string. Obviously, they didn't demo string_view back then, but std::vector and iterators, which should be a very similar problem. Afaik they did demo more a proof of concept instead of a released product back then, but it shows that my expectations that your case should be detectable by a tool are not unreasonable.

Btw.: Of course that there also has to be a solution for custom types, but actually, I would be very happy, if tools knew about std::string_view and gsl::span and checked my code against their specification and not just a particular implementation thereof.

MikeGitb avatar Sep 28 '17 07:09 MikeGitb

Making tools do something special for std::string_view or other hand-picked types is counter-productive, imho. Users will start to rely on checker to find misuses of std::string_view and won't realize that a similar situation in their classes is not reported.

Not sure if we want to discuss this here, but there is also another related problem in the current checkers: they work on a single translation unit (right?). I.e. if the method that leaks internal state is declared in another TU, the checker is blind.

kcc avatar Sep 28 '17 17:09 kcc

@tamaskenez I have been pushing for something exactly like that. @kcc Is it possible that assessment is too colored by frustration?

gdr-at-ms avatar Sep 28 '17 18:09 gdr-at-ms

@MikeGitb The effort should be an increasing function of the risk in that case. Not something easy by default.

gdr-at-ms avatar Sep 28 '17 18:09 gdr-at-ms

@villevoutilainen Fully agreed.

gdr-at-ms avatar Sep 28 '17 18:09 gdr-at-ms

Is it possible that assessment is too colored by frustration?

It is colored by 6+ years of data-driven frustration. Here is one of the recent discussions that I can give a link to, but we find these with asan all the time. Note that here we have llvm::StringRef

The problematic cases should be caught by static analyzers, but are not, because the tools are buggy.

If the guidelines are such that the tool vendors can't implement them, the guilt is shared.

kcc avatar Sep 28 '17 18:09 kcc

Heh, some things never change, I reported an identical bug years ago: https://bugs.llvm.org/show_bug.cgi?id=22625

jwakely avatar Sep 28 '17 19:09 jwakely

@kcc: I think it is not that easy: Strictly speaking, "correct" usage of std::string_view is not defined by its implementation anyway, but by it's specification (the standard). So this is not about having some handpicked types, but about types for which there exists a spec the tool developer has access to.

It also is about separation of concerns: You test the internals of a class against it's spec and you check uses of the class, assuming that it's implementation adheres to that spec. That has the nice side effect that you don't need inter-TR analysis (as string_view is a template, that isn't necessary anyway).

MikeGitb avatar Sep 28 '17 20:09 MikeGitb

@MikeGitb - You're correct that it'd be difficult to catch this in the general sense. There isn't a simple mechanism yet to identify "Does this type have reference semantics? (or otherwise have lifetime ties to its c'tor paramters)"

That said, if we specifically handle the case for std::string_view, absl::string_view, and llvm::string_ref that are assigned to a temporary locally, rather than constructed as a function argument, we'll significantly squash the number of such issues. We shouldn't let perfect be the enemy of good.

tituswinters avatar Sep 29 '17 16:09 tituswinters

For what it's worth, valgrind and asan catch these misuses when the string is dynamically allocated. When SSO kicks in, they by and large don't, but I have had some amounts of trouble to get code to misbehave with an SSO string. The misc-dangling-handle check in clang-tidy relies on the user to tell it which types have reference semantics; that is indeed not perfect but it's significantly better than not catching the errors.

villevoutilainen avatar Sep 29 '17 17:09 villevoutilainen

The misc-dangling-handle check in clang-tidy relies on the user to tell it which types have reference semantics; that is indeed not perfect but it's significantly better than not catching the errors.

It currently has a bug from conversion operator. But I agree with you and it would be an easy one to default configure this check with gsl::span<>,... for everything that is common.

JonasToth avatar Sep 29 '17 17:09 JonasToth

Yes, I am quite aware of that bug, I reported it. :)

villevoutilainen avatar Sep 29 '17 17:09 villevoutilainen

"correct" usage of std::string_view is not defined by its implementation anyway, but by it's specification (the standard).

@MikeGitb, well, ok, agree. If we catch this and other common cases as static checks this will improve things. We'll need to make sure that the checks are extensible to handle user-defined types with the same semantics (e.g. llvm::StringRef). But then, I guess it should be explicitly documented that the guidelines and the checkers do not provide any protection against the general case, only against the known classes.

kcc avatar Sep 29 '17 17:09 kcc

Sounds reasonable

But then, I guess it should be explicitly documented that the guidelines and the checkers do not provide any protection against the general case, only against the known classes.

Although it doesn't really matter, I think one could distinguish here: The guidelines* afaik don't rely on special knowledge about std::string_view. My understanding is that the technique is supposed to work universally (might require some annotations though) but apparently it is quite difficult to fully implement the rules in a tool (considering that MS is now working at least for 2 years on it). Therfore, I'd be ok if the situation can be improved in the short term by "hardcoding" knowledge about some well known reference types.

*) by guidelines I actually mean the Lifetime Paper you can find in the docs folder, whose contend was presented in Herbs talk at cppcon 2015 (https://www.youtube.com/watch?v=hEx5DNLWGgA) - In the guideliens themselves I could hardly find anything about how to avoid dangling pointers.

MikeGitb avatar Sep 29 '17 19:09 MikeGitb

@kcc to address the original statement at the top of this issue: I suspect that in time, static analysis tools will do a better job of flagging mistakes like this. I'll add @annagrin to this thread for visibility as she now owns development of CppCoreCheck.

To address your question about whether the Guidelines say anything specific here: there is work going on in the Guidelines to describe a method for identifying this type of error. The Lifetime profile describes how to identify the problem by looking at the lifetime of the Pointer (string_view in this case) and seeing if it exceeds the lifetime of the Owner of the memory (a temporary string here). The paper in the docs folder does provide an initial presentation of that thinking. More of that work should surface in the future.

I saw a number of people complaining on this thread that static (and even dynamic) tools don't always catch these problems. I agree, it is frustrating when tools have bugs and limitations. The standard, and most effective recommendation I have for you is to let the people who build the tools know what is important to you in terms of missing coverage/features. They are generally pretty reasonable people and do try to fix most commonly received feedback.

neilmacintosh avatar Feb 12 '18 18:02 neilmacintosh

@neilmacintosh Thanks for forwarding! @kcc Incidentally, we have a check for usage of span on temporaries that is scheduled to release in Visual Studio 15.7 (or, if you are using previews, should be available in one of the first two previews of Visual Studio 15.7)

annagrin avatar Feb 12 '18 20:02 annagrin

Since this issue is still open, the SaferCPlusPlus library, for example, has some safer alternatives to std::string_view that safely support referencing temporaries, using either run-time safety mechanisms or (largely enforced) compile-time restrictions.

@kcc wrote:

Making tools do something special for std::string_view or other hand-picked types is counter-productive, imho. Users will start to rely on checker to find misuses of std::string_view and won't realize that a similar situation in their classes is not reported.

And not just hand-picked types, but also hand-picked memory access bug scenarios.

Use-after-free/scope is the heart of what's dangerous about C++, right? A real solution is the holy grail. And a real solution in the form of a static checker that requires no extra effort on the part of the programmer is a rainbow unicorn. While we wait for the latter, how 'bout we contemplate the former?

TL;DR: The SaferCPlusPlus library is (a not yet completely implemented version of) a complete solution to memory safety in C++. I'm going to give some examples of scenarios that intrinsically can't be determined to be safe (or not) at compile-time and demonstrate the elements that you're going to want/need (and that are currently missing from the standard library and gsl) to make those scenarios run-time safe.

Ok, let's consider three ways to ensure a pointer/reference does not access invalid memory: i) Constrain the lifetimes of the pointer/reference and the target objects, at compile-time, in such a way that the pointer/reference never points to invalid memory. ii) The same, but allow the constraints to be enforced at run-time. And iii) Track and check the validity of any dereference of the (constraint-free) pointer/reference (at run-time).

The SaferCPlusPlus library provides all three options. The second option is provided using reference-counting pointer/references (like std::shared_ptr, but with less overhead). The third option is provided by what are called "registered" pointer/references. They are non-owning and, notably, can point to stack allocated objects as well as heap allocated objects.

The first option is provided by what are called "scope" pointer/references. They have no run-time overhead, but are the most constrained as they can only point to objects that can be guaranteed to outlive them, and must themselves be allocated on the stack.

Note that the Rust language achieves memory safety in a similar way, but provides only the first two options. (And imposes an "exclusive write reference" constraint.)

So now, let's look at three representative examples of use-after-free/scope in C++ using string_view:

    {
        std::string s = "Hello ";
        std::string_view sv = s;
        {
            std::string s2 = "World";
            if (foo1()) { sv = s2; }
        }
        if (foo2()) { std::cout << sv; }
    }
    {
        auto s_shptr = std::make_shared<std::string>("Hello ");
        std::string_view sv = *s_shptr;
        {
            if (foo1()) { s_shptr = std::make_shared<std::string>("World"); }
        }
        if (foo2()) { std::cout << sv; }
    }
    {
        std::vector<std::string> vec = { "Hello " };
        std::string_view sv = vec[0];
        {
            if (foo1()) { vec.clear(); }
        }
        if (foo2()) { std::cout << sv; }
    }

Can a static checker correctly determine if these are safe? Well, there exist some foo1() and foo2() for which the answer is no. Right?

So what do we want to do in these cases? What do we want to do if memory safety is non-negotiable?

Well, if memory safety is a given, then the choices are basically either to disallow this code (via compile error or whatever), or use a run-time safety mechanism.

Ok, so here are the examples made safe using reference counting pointers (the second option).

    {
        /* refcounting pointers are just like std::shared_ptr<>s, but smaller, faster and safer. */
        auto s_rcptr = mse::make_refcounting<std::string>("Hello ");
        
        /* String sections are like string_views, except that, instead of a raw pointer iterator they store a "smart" iterator
            that holds a copy of the given (smart) pointer. So in this case the string section is an "owning" reference. */
        auto sv = mse::make_nrp_string_section(s_rcptr);
        {
            auto s2_rcptr = mse::make_refcounting<std::string>("World");
            if (foo1()) { sv = mse::make_nrp_string_section(s2_rcptr); }
        }
        if (foo2()) { std::cout << sv; }
    }
    {
        auto s_shptr = std::make_shared<std::string>("Hello ");
        auto sv = mse::make_nrp_string_section(s_shptr);
        {
            if (foo1()) { s_shptr = std::make_shared<std::string>("World"); }
        }
        if (foo2()) {
            /* This is safe because sv holds a copy of the shared_ptr to the "Hello " string. */
            std::cout << sv;
        }
    }
    {
        std::vector<std::shared_ptr<std::string>> vec;
        vec.push_back(std::make_shared<std::string>("Hello "));
        auto sv = mse::make_nrp_string_section(vec[0]);
        {
            if (foo1()) { vec.clear(); }
        }
        if (foo2()) { std::cout << sv; }
    }

All three examples made safe from use-after-free/scope. The only innovation here is that string sections can hold any kind of smart iterator, which in turn can hold any kind of smart pointer, including strong pointers like std::shared_ptr. Since string sections are templates, they do not add any run-time overhead versus std::string_view.

Ok, but what if you want to avoid extra heap allocations? That's where registered pointers come in (option three).

    {
        /* TRegisteredObj<> is a transparent wrapper that overloads the '&' operator. */
        mse::TRegisteredObj< std::string > s("Hello ");
        
        /* &s is not a raw pointer. It's a "registered" pointer. */
        auto sv = mse::make_nrp_string_section(&s);
        {
            mse::TRegisteredObj< std::string > s2("World");
            if (foo1()) { sv = mse::make_nrp_string_section(&s2); }
        }
        if (foo2()) {
            /* If sv refers to the deallocated s2 string, the next line will throw an exception. It is not undefined 
                behavior .*/
            std::cout << sv;
        }
    }
    {
        std::vector<mse::TRegisteredObj< std::string >> vec = { "Hello " };

        /* Again, &(vec[0]) is not a raw pointer. It's a "registered" pointer. */
        auto sv = mse::make_nrp_string_section(&(vec[0]));
        {
            if (foo1()) { vec.clear(); }
        }
        if (foo2()) {
            /* Again, if sv refers to a now deallocated string, the next line will throw an exception. */
            std::cout << sv;
        }
    }

Registered pointers do have a run-time cost, but that cost is usually significantly less than the extra heap allocations you'd otherwise incur.

And finally, if you want to avoid any extra run-time cost, you'd use scope pointers (option one). While scope pointers are safe (when used as intended), they are not as flexible as registered pointers, so some scenarios just won't compile, like the first example.

    {
        /* TXScopeObj<> is a transparent wrapper that overloads the '&' operator. */
        mse::TXScopeObj< std::string > xscope_s("Hello ");
        
        /* &xscope_s is not a raw pointer. It's a "scope" pointer. */
        auto xscope_sv = mse::make_xscope_nrp_string_section(&xscope_s);
        {
            mse::TXScopeObj< std::string > xscope_s2("World");
            if (foo1()) {
                /* The scope versions of string sections do not support the assignment operator. */
                xscope_sv = mse::make_nrp_string_section(&xscope_s2); } // <-- compile error
            }
        }
        if (foo2()) { std::cout << xscope_sv; }
    }
    {
        auto s_shptr = std::make_shared<std::string>("Hello ");
        
        /* An xscope_strong_pointer_store stores a copy of a strong/owner pointer to ensure that the target object survives to
            the end of the scope. */
        auto xscope_store = mse::make_xscope_strong_pointer_store(s_shptr);
        
        /* This enables us to obtain a scope pointer to the object. */
        auto xscope_ptr = xscope_store.xscope_ptr();
        
        auto xscope_sv = mse::make_xscope_nrp_string_section(xscope_ptr);
        {
            if (foo1()) { s_shptr = std::make_shared<std::string>("World"); }
        }
        if (foo2()) {
            /* This is safe because xscope_store holds a copy of the shared_ptr to the "Hello " string and xscope_sv refers to
                that string. */
            std::cout << xscope_sv;
        }
    }
    {
        /* mse::mstd::vector<> is just a safe implementation of std::vector<>. */
        mse::TXScopeObj< mse::mstd::vector< std::string > > xscope_vec = mse::mstd::vector< std::string >{ "Hello " };
        
        /* Ok, here we're obtaining a "lock" object that ensures the vector cannot change size. */
        auto xscope_vector_lock = mse::mstd::make_xscope_vector_size_change_lock_guard(&xscope_vec);
        
        /* This enables us to obtain a scope pointer to any element of the vector. */
        auto xscope_ptr = xscope_vector_lock.xscope_ptr_to_element(0);
        
        auto xscope_sv = mse::make_xscope_nrp_string_section(xscope_ptr);
        {
            if (foo1()) {
                /* Because the vector is "locked", the next line will trigger an exception. The size of the vector will not be
                    changed. */
                xscope_vec.clear();
            }
        }
        if (foo2()) { std::cout << xscope_sv; }
    }

In the first example, we attempted to target a string_view/section at a string object that will not outlive it. Since this cannot, in general, be determined to be safe (without run-time overhead), you get a (desired) compile error.

In the second and third examples, the target string object has dynamic (flexible) lifetime. So we can add a constraint that ensures that the string object will outlive the string view/section reference without adding run-time overhead to the reference.

So there you have it. Use-after-free/scope "solved" in C++.

It's not magic, the Rust language uses a similar solution (but with the arguably over-kill "exclusive write reference" restriction).

Ok, so let's so let's see how this addresses the original problematic example of this thread:

   {
        std::string s = "Hellooooooooooooooo ";
        std::string_view sv = s + "World\n";
        std::cout << sv;
   }

First, remember that string sections aren't created with a string parameter, but rather a (safe) pointer to a string. (Or, of course, a (safe) iterator and length.)

   {
        std::string s = "Hellooooooooooooooo ";
        auto sv = mse::make_nrp_string_const_section(&(s + "World\n")); // <-- (desired) compile error
        std::cout << sv;
   }

Even if you can obtain a raw pointer to a temporary, these ("nrp") string sections don't support raw pointers because they are unsafe. An std::shared_ptr would work just fine though:

   {
        std::string s = "Hellooooooooooooooo ";
        auto sv = mse::make_nrp_string_const_section(std::make_shared<std::string>(s + "World\n"));
        std::cout << sv;
   }

Or the corresponding reference counting pointer from the SaferCPlusPlus library:

   {
        std::string s = "Hellooooooooooooooo ";
        auto sv = mse::make_nrp_string_const_section(mse::make_refcounting<std::string>(s + "World\n"));
        std::cout << sv;
   }

This might look like too much typing for some people, but you can use the existing, or your own, shorter aliases:

   {
        std::string s = "Hellooooooooooooooo ";
        auto sv = mse::mknscs(mse::mkrclp(s + "World\n"));
        std::cout << sv;
   }

And if you think (mistakenly in this case) that you don't need the extra heap allocation, you can use a registered pointer:

   {
        std::string s = "Hellooooooooooooooo ";
        auto sv = mse::make_nrp_string_const_section(&mse::TRegisteredObj<std::string>(s + "World\n"));
        try {
            std::cout << sv;
        }
        catch (...) { std::cout << "expected exception\n"; }
   }

Or with shorter aliases

   {
        std::string s = "Hellooooooooooooooo ";
        auto sv = mse::mknscs(&mse::mkrolp(s + "World\n"));
        try {
            std::cout << sv;
        }
        catch (...) { std::cout << "expected exception\n"; }
   }

And if you try it with (zero-overhead) scope string sections:

   {
        std::string s = "Hellooooooooooooooo ";
        auto sv = mse::make_xscope_nrp_string_const_section(&mse::TXScopeObj<std::string>(s + "World\n"));
        std::cout << sv; // <-- (desired) compile error
   }

Scope string sections referencing temporary strings are actually a (functionally inert) distinct type from regular scope string sections and can only be converted to a regular (functioning) scope string section when received as a function parameter.

So I think we really can eliminate use-after-free/scope bugs in our C++ code today. But we have to abandon raw pointers and raw pointer wrappers like std::string_view (and span<>), or have the static checker treat them like scope references and impose the associated restrictions.

In practice, you're probably going to need something like registered pointers as well. The Rust language gets away without them, but they have more draconian restrictions on references, a non-trivial "borrow checker", and sometimes require the programmer to add "lifetime annotations" to their references.

You can argue that (in the future) static checkers will be able to recognize many uses of pointer/reference types that are safe without the need for run-time overhead, and that are not covered by (zero run-time overhead) scope pointer/reference types. Sure, but I'd argue that anyway that ability is more appropriately placed in the optimizer rather than a static checker, so using run-time safety mechanisms in your code today doesn't condemn it to being sub-optimal/obsolete in the future.

duneroadrunner avatar May 17 '18 21:05 duneroadrunner

So I think that after the lengthy discussion here, this issue can now be closed. In summary: the use-after-free potential with string_view is accounted for in the Lifetime profile, even if there may be bugs or limitations in preliminary implementations that mean it is not diagnosed.

neilmacintosh avatar Aug 13 '18 05:08 neilmacintosh