cppfront icon indicating copy to clipboard operation
cppfront copied to clipboard

[SUGGESTION] Implicitly pass variables as inout and out parameters where appropriate, possibly REMOVE inout and out parameters entirely

Open Waffl3x opened this issue 1 year ago • 0 comments

Out and inout (not referring to cppfront) parameters were born out of legacy requirements as far as I can tell. The problems out parameters are a solution for are much better solved with new features; structured bindings for multiple returns (and of course cppfront's solution for multiple returns) and return value optimization. I'm not entirely certain on the history given I am a relatively young guy, but for these reasons I see out and inout parameters as a relic of the past that we haven't been able to eliminate the technical need for. I believe we can eliminate the technical need.

When to use out or inout parameters: -returning multiple variables -optimization concerns As noted, the first problem was already totally solved with structured bindings. However, the second has not been. In most (all?) cases RVO can match an out parameter as far as I know, although NRVO isn't guaranteed, but there's still plenty of reason to use inout parameters, even if the use for out parameters is nearly eliminated. The use cases for inout parameters is a little more justified, mutating an object without having to copy it on the way in, and on the way out for example, but wait, isn't that a bit strange? If it's just about optimization, why can't we just extend RVO to this as well? Then we can narrow down mutation to only happen when the assignment operator is involved. (Or member functions, I'm not coming after those, yet.)

This would be a big simplification on the language, there would no longer be the question of whether a parameter is mutated by the function or not. Parameters would never be mutated by the function, except for when they take ownership, in which case the caller of the function has to explicitly provide a variable that is movable. In every case, there is no question about what the state of a variable that is passed as an argument, it will be unmodified. (There is one exception, copy constructors that have side effects.)

There are already parts of the language that work like this as well, such as the addition assignment operator for std::string. Granted it is a member function so it makes sense, but I still believe it demonstrates the appeal. With that said though, if I had my way I would have the addition assignment (str1 += str2) and assigning from the result of addition where the variable being assigned to and the left operand are the same (str1 = str1 + str2) be equivalent.

As far as my intermediate knowledge of C++ goes, I believe what I'm describing should be possible even in the standard language itself as a guaranteed optimization, but it would be incredibly difficult to do. In cppfront not only would it be easier to do so, but we can do better than what is possible in the standardized language.

There are 2 different options that I have come up with, originally I was only going to pitch one of them but as I was writing this I realized that there are less problems with the second option than I originally thought. With that said, my guess is the first will be the more preferred, while the second is what I prefer.

Option one

The first option is to keep out and inout parameters and build upon them, the cppfront compiler will mostly only need to rearrange the arguments when translating to standard cpp, and the programmer will still need to explicitly make the choice on which parameters are inout and out parameters. I imagine the simplest way to do this would be to require the programmer name the parameter variables and return variables the same, then the cppfront compiler knows where to direct the out parameters to.

// cppfront
add_bar: (inout s: std::string) -> (s: std::string) = {
    s.append("bar");
    return;
}

main: () -> int = {
    a: std::string = "foo";
    a = add_bar(a);
}
// transpiled cpp
auto add_bar(std::string& s) -> void {
    s.append("bar");
    return;
}
auto main() -> int {
    std::string a = "foo";
    add_bar(a);
} 
// cppfront
set_to_bar: (out s: std::string) -> (s: std::string) = {
    s = "bar";
    return;
}

main: () -> int = {
    x: std::string;
    x = set_to_bar();
    y: std::string = set_to_bar();
    z: const std::string = set_to_bar();
}
// transpiled cpp
auto set_to_bar(std::string& s) -> void {
    s = "bar";
    return;
}
auto main() -> int {
    std::string x;
    set_to_bar(x);
    std::string y; // has to declare variable for the user in this case
    set_to_bar(y);
    // if we initialize a const variable when implicitly using an out param
    // we need to generate a lambda expression as well
    // the other option is to just disallow initializing const variables by out params, which might be more correct
    const std::string z = [](){
        std::string r;
        set_to_bar(r);
        return r;
    }();
}

As I don't like this option as much, I haven't thought too hard about the syntax, perhaps it would be better omitting the return variables, or perhaps the return variables could instead be declared as out. There's also the order that is a question, but the obvious answer is to just go by the return variable declaration order.

Option two, my preferred choice

The second option is to completely remove out and inout parameters and have the cppfront compiler do the heavy lifting. This is by far the option I prefer as I am quite adamant that inout parameters are an optimization, and as an optimization, they aren't something the programmer should have to think about.

// cppfront
set_to_bar: () -> (s: std::string) = {
    s = "bar";
    return;
}

main: () -> int = {
    a: std::string;
    a = set_to_bar();
    b: = set_to_bar();
}
// transpiled cpp
auto set_to_bar(std::string& s) -> void {
    s = "bar";
    return;
}
// or maybe the compiler should generate this?
auto set_to_bar() -> std::string {
    return {"bar"};
}
// the above isn't very realistic to be doable most of the time though
// so realistically it generates this if it decides that an out parameter is not an optimization
auto set_to_bar() -> std::string {
    std::string s = "bar";
    return s;
}
auto main() -> int {
    std::string a;
    set_to_bar(a);
    // the cppfront compiler needs to choice one of the following for this case
    // don't use out param
    std::string b = set_to_bar();
    // do use out param
    std::string b;
    set_to_bar(b);
} 

Obviously there are some problems to solve here, and I don't know where to even begin to provide this kind of decision making here. However I would argue that it's unclear what the compiler should do, because it's unclear if an out param has benefits here, but even in this simple situation it gives things to think about, but not to the programmer. Ultimately, I will continue to argue that out parameters should not be necessary, not in cppfront and not in standard cpp.

Thankfully, it is a lot more clear how to identify an inout parameter optimization. Take an in parameter, modify it, return it, that's the most basic criteria for inout optimization. Granted there are some caveats here, you will need to return the same variable no matter the branch. On the other hand, I might be mistaken there, if a different variable gets returned, then on the generated side it can be represented with an assignment.

// cppfront
add_bar: (in s: std::string) -> std::string = {
    s.append("bar");
    return s;
}

main: () -> int = {
    a: std::string = "foo";
    a = add_bar(a);
    b: = add_bar(std::string{"baz"});
    c: std::string;
    c = add_bar(b);
}
// transpiled cpp
auto add_bar(std::string& s) -> void {
    s.append("bar");
    return;
}
// the cppfront compiler will need to generate another version of the function
// if it gets used in a manner that can't be inout optimized
auto add_bar_takes_copy(std::string s) -> std::string {
    s.append("bar");
    return s;
}
auto main() -> int {
    std::string a = "foo";
    add_bar(a);
    // calls the by copy version of add_bar()
    std::string b = add_bar_takes_copy(std::string{"baz"});
    // also calls the by copy version of add_bar()
    // because the in argument is a different variable than is being assigned to
    std::string c; // not required, but the orginal code was like this, changing it for no reason would be bad
    c = add_bar_takes_copy(b);
}

Clearly there would be a lot of work to get this functioning, but I did say this option would be the more difficult to implement. Something to note, in the case that you absolutely don't want this optimization to take place, it's as simple as taking the parameter by copy. However I can't think of a good reason to do so right now, aside from move/copy assignment functions with bad side effects, and I don't think that qualifies as a good reason.

As you can see, another problem is the cppfront compiler might have to generate more than one version of the function. I would argue this is a benefit rather than a detriment though. We went from needing to declare a function parameter as inout to gain the extra performance of avoiding an extra allocation, to now having the optimization be applied automatically where it can be, while also leaving the function general enough to be used with literals.

There's likely a lot more to consider than this, and while this is not the first time I've thought about something like this to eliminate out and inout parameters from being required in standard cpp, this is the first draft of an idea that is even remotely workable. I'm still quite confident that this is viable.

Option three...

Do nothing. :( I do acknowledge the work that's been done here gets us a lot of the way there. While I think that it would be a huge improvement to completely eliminate the need for out and inout pointers, the demos have already demonstrated that much of the benefits have been reaped. That doesn't mean we can't do better.

Before I wrap up, here are direct responses to the questions posed for suggestions.

Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code? None directly, however both options I presented will work with the use before initialization checking. Other than that, there will never be a question of whether a parameter is mutated or not, beyond clarity of code, I'm not sure what impact, if any, this has on security though.

Will your feature suggestion eliminate X% of current C++ guidance literature?" Yes, I strongly believe it would, albeit to different degrees depending on the selected method, but I will comment on my preferred method as that should have the higher impact. This will reduce the amount of ways to mutate a variable. There will no longer be an uncertainty of how to "properly" change the values of something. It will always be assignment, regardless of triviality, size, complexity, or anything else. We will no longer have to prematurely explain allocations or moves, the compiler will just do the right thing automatically. Eliminating out and inout parameters reduces the available choices, and also increases the frequency that the default choice, in, is the correct choice for a given scenario. The amount of ways to correctly get values out of functions also drops to a single one, through the function's return. I'm also pretty sure it completely eliminates the necessity for uninitialized variables as the only time I remember having them recently was when calling functions with out parameters. I can't be certain about that claim, but if I'm correct, then we can teach that they aren't required and to just not ever have them. This would require teaching use of immediately invoked lambda expression for complex initialization, but I don't consider that a big deal. Finally, passing uninitialized variables to functions would always be incorrect, and could be disallowed by the initialization checker indiscriminately.

Describe alternatives you've considered. As an alternative, change nothing, the parameter types as you've designed them gets us quite far. I described 2 options above, if I were to sum them up again, the first builds off of out and inout parameters instead of eliminating them, while the second eliminates them. I highly prefer the second, but I believe either one would be an improvement.

In conclusion, I want to say that I initially was very against the proposal for parameter passing, but I've warmed up to them a lot. What I haven't warmed up to are out and inout parameters, I maintain convinced that they have no place in a modern language and only still exist in cpp because people are used to them, or because they are a required optimization in a lot of cases. I've spent a lot of time in the last 6 months reading an old C codebase, it's basically impossible to get by without out and inout parameters in C. Figuring out that code is not very pleasant, and theres always a question of whether something is being modified somewhere. I think the claim that was made in your talk was incorrect, out parameters are not the most important one, it's merely (well perhaps not merely but massively) the good static analysis. I think it is incorrect to attribute that victory to out parameters, especially since assignment would also satisfy the initialization checker, so why does out get the credit? If we go my route, it get's even simpler, as there would be no situations where it is valid to pass uninitialized variables to functions, ever.

There's still a few problems I reckon, but I don't believe I'm experienced enough to solve it all on my own. Thanks for reading my proposal, not just Herb, but everyone else who happens to swing by.

Waffl3x avatar Nov 17 '22 01:11 Waffl3x