CppCoreGuidelines icon indicating copy to clipboard operation
CppCoreGuidelines copied to clipboard

F.16 F.18 Consume and input parameters

Open adarnon opened this issue 8 years ago • 14 comments

Rule F.18 suggests you should always pass parameters by X&& and std::move them. However, in the case where you have a function that receives multiple parameters and it keeps a copy of all of them (a constructor, maybe), I believe the better way is to pass them by value and then std::move them. This also saves you the trouble of adding a Cartesian product of overloads with const X&.

For example:

class Foo {
    Foo(std::vector<int> a, std::vector<int> b): _a(std::move(a)), _b(std::move(b)) {}
private:
    std::vector<int> _a;
    std::vector<int> _b;
};

adarnon avatar Feb 14 '16 18:02 adarnon

Related note: I'm a little hesitant about recommending move-into-place by default until we've got better tooling for detecting use-after-move. The (usually minor) gains in most places can be easily wiped out by the productivity loss in tracking down use-after-move problems.

I suspect that in 3-5 years this will be much clearer and better understood, but the transition period is going to be painful for a lot of people and organizations, and I think we should tread carefully.

tituswinters avatar Feb 17 '16 15:02 tituswinters

+1 @tituswinters

LegalizeAdulthood avatar Feb 18 '16 20:02 LegalizeAdulthood

Maybe I'm missing something, but the function copies them anyway (since they are received by value), you don't have a possibility of use-after-move bug. The only pitfall is that you might forget to use std::move in the initialization list and then you'll cause two copies if your compiler doesn't optimize them.

adarnon avatar Feb 20 '16 20:02 adarnon

If we're telling everyone to move into sink functions (those that store or modify their inputs), there'll be more calls to move, and more use-after-move. We're already seeing it for teams that have decided that this is a great pattern.

tituswinters avatar Feb 22 '16 14:02 tituswinters

That rule should define 'consume', 'forward', 'in', 'out'. Then for each of the following rules, there should be a clarifying note referring back to that rule.

gdr-at-ms avatar May 15 '17 18:05 gdr-at-ms

I am not sure I understand what additional problem "use after move" presents than any other operation performed on the vector. For example:

std::vector<int> v {1, 2, 3};

// ... do stuff

func(std::move(v)); // dump its contents here

for(auto i: v)
{
    // oops, undetermined contents
}

But how is that different from any use of a vector where the programmer "forgets" what they previously just did to it?

For example what about "use after clear" problems?

std::vector<int> v {1, 2, 3};

// ... do stuff

v.clear(); // empty the vector here

for(auto i: v)
{
    // oops, why does this code make sense?
}

If the programmer "forgets" or otherwise fails to take into consideration the immediate history of the object it will always be a problem. I am not sure why "use after move" is any different from "use after assignment", "use after passing by reference" or "use after whatever else I might do to a vector" problem?

I mean are "use after move" occurrences more prevalent than any other inappropriate use of an object given its immediate history?

Thinking about it would it not be fairly easy to detect when the programmer has used the indeterminate value of an object after it has been moved using static checking tools?

galik avatar May 15 '17 19:05 galik

I'll claim that most containers (counted by instantiation or CPU usage or both) are built up and then used in a mostly read-only fashion. That means that although there are existing concerns about operations invalidating iterators or pointers or the like, they don't come up much in practice. Use-after-move changes the norms: containers are destroyed or invalidated in more ways than before, and more commonly.

It's clearly a programmer error, but it's more subtle/harder to spot than use-after-free, but just as nefarious.

(And don't get me started on the issues of sometimes-ok-use-after-move as a result of small-string-optimization.)

tituswinters avatar May 15 '17 19:05 tituswinters

It's been a while. Anything changed about detecting "use after move"?

Anyway, I came here to give a comment about sinking arguments into a constructor.

As @adarnon wrote in the first comment, the consensus about writing constructors and avoiding overload explosion, is to take all arguments by value and move from them.

I wanted to propose a different approach, that is a bit more explicit:

// this function might become part of GSL or even standard
template <typename T>
T copy(const T& obj)
{
    return obj;
}

class Foo
{
private:
    std::vector<int> m_v1, m_v2;
public:
    // take arguments as rvalue refs, as all sink functions should
    Foo(std::vector<int>&& v1, std::vector<int>&& v2)
        : m_v1{std::move(v1)}
        , m_v2{std::move(v2)}
    {}
};

int main()
{
    { // move arguments, works as expected
        auto v1 = std::vector<int>{1, 2, 3};
        auto v2 = std::vector<int>{4, 5, 6};
        auto foo = Foo{std::move(v1), std::move(v2)};
    }
    { // no move, fails to compile
        auto v1 = std::vector<int>{1, 2, 3};
        auto v2 = std::vector<int>{4, 5, 6};
        auto foo = Foo{v1, v2};
    }
    { // you can't move const objects, fails to compile
        const auto v1 = std::vector<int>{1, 2, 3};
        const auto v2 = std::vector<int>{4, 5, 6};
        auto foo = Foo{std::move(v1), std::move(v2)};
    }
    { // if you have object that cannot be moved, then copy it explicitly
        const auto v1 = std::vector<int>{1, 2, 3};
        const auto v2 = std::vector<int>{4, 5, 6};
        auto foo = Foo{copy(v1), copy(v2)};
    }
}

etam avatar Dec 14 '18 09:12 etam

Is there any disadvantage of "pass by value and move" over "overload the function for const T& and T&&" at all? It seems to mostly lead to code duplication.

Until today, my "rule" for input parameters of any kind was:

  • pass by value, if T is cheep to copy, or a copy would need to be made inside the function anyways
  • pass as const T& otherwise
  • use T&& for (rare) special cases when special behavior for rvalues is needed for some reason

Made sense to me and is easy to apply and teach. Now after reading the guidelines I am quite confused, as it does not seem to be the recommended way or at least was not mentioned at all. I can not come up with an example where taking a const T& and then copying manually inside the function would be better. That should be true, even if the function does not "retain" or "consume" (which are not clearly defined anyway) the copy but simply operates on it.

Now writing this I realize my confusion might come from the fact, that I do not understand what problem the "in & will-move-from" case and the "in & retain copy" case is trying to solve. How are they semantically different from the simple "input" case and why is that distinction a useful one to make?

hschwane avatar Feb 16 '21 01:02 hschwane

The policy I've been advocating is roughly this (for cases where the parameter is being sinked/consumed):

If the body of the function is doing work that is comparable to the cost of a move-construction, then do the const T& + T&& overload dance (with the potential code duplication). You see this for containers in particular: something like push_back is specifically doing work comparable to a move, so we don't want to force an extra move.

On the other hand, if the amount of inherent work in the function is already large (or the type is known to be a cheap/free move) then just do T by value, it's much simpler.

(The semantic distinction is whether the function being called needs a T to modify or just needs a T to read. Storing a copy for yourself is a modification because move is potentially destructive. In cases where we only need one to read, const T& by itself.)

tituswinters avatar Feb 16 '21 12:02 tituswinters

So that would give us:

in x

  • read only from x
  • if cheap to copy: T x, else: const T& x

copy x

  • perform operations on a copy of x (including moving it out of the function)
  • if O(cost of function) <= O(cost of moving T) AND T is not known to be cheap to move: const T& x + T&& x, else: T x
  • If the caller cares about performance and wants to avoid a copy, it is the callers responsibility to call std::move() when putting parameters into the function. The default behavior should be a copy as this produces correct code most of the time. Requiring the caller to explicitly copy (eg by only providing T&& x) would be unnecessarily complex when that is the most likely use-case and violate F.15: Prefer simple and conventional ways of passing information.

move x

From the function author point of view, that still seems to be the same as "copy" to me. From the point of view where the function is called we have the option to simply pass our variable potentially copying it, or using std::move to give up the ownership. The "copy" and the "move" cases seem to only be different from the function call point of view, not the function author point of view. And all the other rules (in,out,inout,forward) are defined from the function author point of view so making this distinction here is confusing to me.

inout x

  • read from and write to x
  • T& x

out x

  • parameter that is only written to
  • see rules in F.20

forward x

  • parameter is used like an input parameter (only read from) and then later passed on to another function
  • see rules in F.19

Do I understand this correctly now?

I also found #1407 so clearly I am not the first person confused with those guidelines. Unfortunately it seems like the concerns discussed there were never actually included. But the fact that this is coming up again and again shows that the guidelines are not clear here.

[edit] remove 'his' from copy description

hschwane avatar Feb 16 '21 15:02 hschwane

Nit: In "Copy" you're saying "his", but programming isn't gendered.

I'm not an owner for core guidelines, but your summary mostly matches my understanding.

I don't know that there is a "move" case, so much as the previously-discussed "sink" case, with optimization.

There is a "maybe move" case, which is the one case that tends to use T&& by itself (not as an overload set). We see that very rarely. There's a case coming out of WG21/SG1 for RCU where a unique_ptr is passed as T&& - if the operation is successful, ownership is taken, if the operation fails then ownership stays with the caller. It's unusual enough to not necessarily need to be documented, BUT people asking about T&& as a "force callers to move" comes up often enough that it may be worth describing what those semantics actually look like.

tituswinters avatar Feb 16 '21 16:02 tituswinters

Thanks for the feedback!

"sink", "move", "copy", "consume", "retain" all seem to refer to the case "I need input data which I can freely modify (including moving from) without effecting the caller". For me the clearest way to communicate that semantic is T x with const T&x + T&& as a performance optimization. When T is not copy-able T x does the right thing and forces the caller to move. I have not jet seen an example where forcing the caller to move a copy-able type would be useful. Seems like an unnecessary restriction / complication. If you have one that would be very interesting. We could put it as a special exception to this case.

I will also think about the "maybe move" case, that sounds a bit complicated.

hschwane avatar Feb 16 '21 17:02 hschwane

Yep, that all sounds accurate.

tituswinters avatar Feb 16 '21 17:02 tituswinters