[SUGGESTION] Ability to disable UFCS both via cppfront and for regions in source code
I'd like the ability to disable UFCS with 2 mechanisms:
- A new switch on cppfront to control whether UFCS is enabled (defaults to
true) for a whole Cpp2 file - Some kind of
#pragma push/popor[[attribute]]-style code that cppfront would read when parsing the Cpp2 source code to control whether UFCS function calls are enabled for a region of code inside a Cpp2 file
Motivations:
- Hopefully a temporary motivation, but there are some UFCS bugs which remain to be fixed, and I find myself having to add workarounds in the code when I encounter them, but it'd be easier to just disable UFCS.
- The debugging experience with UFCS is not ideal (at least for me in VS) because it doesn't step over the function calls like I'd expect. See also #844
- I have to admit that I'm now on the fence as to the usefulness of UFCS. I started out firmly in the positive camp but have slowly moved to neutral. Right now I'd prefer an explicit opt-in type of UFCS (more like C# extension methods) rather than working globally, silently, and implicitly.
Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code? No
Will your feature suggestion automate or eliminate X% of current C++ guidance literature? No
I can second this. Also the error messages if the lookup fails are usually not very intuitive.
Thanks for raising this. I appreciate the feedback. The three motivations I see above are: implementation bugs, debugging experience, and error message readability.
A fourth one I'm curious about: Have folks encountered significant problems with actually invoking an unintended function?
That last one would especially be motivation to make UFCS be opt-in in the language via a distinct third function call syntax, which doesn't need to be too heavyweight... One simple candidate I've considered would be .., as in vec..ssize() for example:
-
..is similar to., to connote what it tries first -
..is easy to type, and probably not easy to type accidentally (I don't remember writing it as a typo) -
..keeps the object on the lhs for better IDE support and arguably -
..might connote an "extended." -
..might connote a "look outside the object's scope" similarly to..in file systems (this last one might be a stretch).
And I don't think .. would necessarily get in the way of future evolution, because if there were a "ranges operator" I think ... still works well for that (e.g., 1...10, begin()...end()). I think ... as a ranges operator doesn't collide with pack expansions because it occurs in a different place in the grammar (it would be a binary expression, whereas so far pack expansion ... in an expression can only occur as a unary operator or in the place where an identifier would appear, if I recall right).
I think I'd prefer an opt-out syntax as opposed to an opt-in syntax. Debugging/error messages can be improved over time as tools become cpp2 aware (I hope 🤞.)
Something like using this to say "I don't want to use UFCS" feels somewhat intuitive to me.
// cpp2
vec.this.size();
// lowers to cpp1 as (no UFCS)
vec.size();
To me vec.this reads as "vec access as this" only because this is a keyword.
Another opt-in syntax is .:.
See https://github.com/hsutter/cppfront/issues/741#issuecomment-1806845249.
A fourth one I'm curious about: Have folks encountered significant problems with actually invoking an unintended function?
No, I haven't experienced that. I like how it dispatches currently especially with inheritance (code below). A scenario I think has been discussed elsewhere is that adding a member function to a type could change existing code's behaviour if the existing code is depending on UFCS calling a non-member function (of the same name), but that hasn't happened to me so far (though perhaps would only happen with enough elapsed time and code refactoring / library upgrades).
UFCS with inheritance
my_string1: type = {
this: std::string;
using std::string::string;
}
my_string2: type = {
this: std::string;
using std::string::string;
}
// Special behaviour for `my_string1`
trim: (inout s : my_string1) = {
s.clear();
}
// Default behaviour
trim: (inout s : std::string) = {
s.erase(
s.begin(),
std::find_if(s.begin(), s.end(), :(ch: char) -> bool = !std::isspace(ch)));
s.erase(
std::find_if(s.rbegin(), s.rend(), :(ch: char) -> bool = !std::isspace(ch)).base(),
s.end());
}
main: () -> int = {
str1: my_string1 = (" hello, world ");
str1.trim();
std::cout << "Expect this to be empty: [(str1)$]\n";
str2: my_string2 = (" hello, world ");
str2.trim();
std::cout << "Expect this to be trimmed: [(str2)$]\n";
return 0;
}
Thanks! So the issues really are: active bugs, debugging experience, and error message readability.
I think I can do something about at least two of those, and maybe all three, in the next month.
For example, for error message readability I was considering something like changing the UFCS implementation from
if constexpr (requires{ CPP2_FORWARD(obj).CPP2_UFCS_REMPARENS QUALID TEMPKW __VA_ARGS__(CPP2_FORWARD(params)...); }) { \
return CPP2_FORWARD(obj).CPP2_UFCS_REMPARENS QUALID TEMPKW __VA_ARGS__(CPP2_FORWARD(params)...); \
} else { \
return MVFWD(CPP2_UFCS_REMPARENS QUALID __VA_ARGS__)(CPP2_FORWARD(obj), CPP2_FORWARD(params)...); \
} \
to make the else into an else if to get the constraint that the fallback is actually legal, and if neither is legal have a final else with a nice diagnostic that the call is invalid. That final else could so test whether the result would have been valid if obj had been an lvalue, and if so give a nicer error that it was the definite last use that caused the issue.
That's a sketch... how does that sound?
Edited to add: Something like this...
if constexpr (requires{ CPP2_FORWARD(obj).CPP2_UFCS_REMPARENS QUALID TEMPKW __VA_ARGS__(CPP2_FORWARD(params)...); }) { \
return CPP2_FORWARD(obj).CPP2_UFCS_REMPARENS QUALID TEMPKW __VA_ARGS__(CPP2_FORWARD(params)...); \
} else if constexpr (requires{ MVFWD(CPP2_UFCS_REMPARENS QUALID __VA_ARGS__)(CPP2_FORWARD(obj), CPP2_FORWARD(params)...); }) { \
return MVFWD(CPP2_UFCS_REMPARENS QUALID __VA_ARGS__)(CPP2_FORWARD(obj), CPP2_FORWARD(params)...); \
} else { \
if constexpr (requires{ obj.CPP2_UFCS_REMPARENS QUALID TEMPKW __VA_ARGS__(CPP2_FORWARD(params)...); }) { \
static_assert( false, "member function call would have succeeded if obj were an lvalue - is this illegal because this is a definite last use of obj?" ); \
} else if constexpr (requires{ MVFWD(CPP2_UFCS_REMPARENS QUALID __VA_ARGS__)(obj, CPP2_FORWARD(params)...); }) { \
static_assert( false, "free function call would have succeeded if obj were an lvalue - is this illegal because this is a definite last use of obj?" ); \
} \
} else { \
static_assert( false, "attempts to call obj.f() and f(obj) are both invalid - did you spell the function name correctly?" ); \
} \
} \
That would really improve the situation.
A side note: Allowing to disable UFCS on a call by call basis, could help to find bugs in the implementation.
Sounds good, I look forward to testing with that new version!
Adding emphasis:
Thanks! So the issues really are: active bugs, debugging experience, and error message readability.
I think I can do something about at least two of those, and maybe all three, in the next month.
Update: I think I've improved the third one sufficiently (error message readability) with today's commit. Thanks for pushing on this!
Next up, the other two:
What UFCS bugs do you most commonly encounter? Just an example here in this thread would be fine, or an issue number link if it's already reported.
For the debugging experience, you wrote "because it doesn't step over the function calls like I'd expect" -- can you give an example? (I saw a reference to #844 but that seems to be a different case where the generated function doesn't have a source location in the original code at all, whereas with UFCS the call site is in the original code... unless it's in generated code of course but then that's the proximate issue rather than the UFCS?)
What UFCS bugs do you most commonly encounter?
My wish list would be #999 and #1002 -- I think those are the ones I see most often.
because it doesn't step over the function calls like I'd expect
I'll make a GIF/video of this. The problem was that the debugger required 2 steps to step over a UFCS call, instead of 1 like you'd expect. Almost as if the first step was invisibly stepping into the macro.
Thanks, a video would be good.
I recall that it's stepping into the IILE first.
I recall that it's stepping into the IILE
Yes, I think that's it. Here's a video showing the issue in VS 2022.
https://github.com/hsutter/cppfront/assets/2060747/21b1985f-47ed-44c7-aa06-59bf62b2cebf
I would suggest using the option to turn off line numbers, and then debug the generated code to see exactly what's happening. I suspect that it's multiple lines of code in cpp that happen to map to the same line in the cpp2.
As shown in the comment https://github.com/hsutter/cppfront/pull/904#issuecomment-2150622367 UFCS macro/template can have a real impact on compile times.
| name | compile | compile adjusted | compile relative |
|---|---|---|---|
| base_03_cpp2_struct-compile | 14.14 | 0 | 0 |
| cleanup_00_base-compile | 25.48 | 11.35 | 1 |
| cleanup_01_remove_UFCS-compile | 19.49 | 5.36 | 0.47 |
In this thread there have been sugestions for a new syntax:
-
..to opt-in or opt-out -
vec.this.size()to opt-out -
.:to opt-in. Nearly, all can be used to either opt-in or opt-out.
One idea of mine would be to use metafunctions to change the state of the cppfront compiler. e.g.
test: @UFCS_NO namespace = {
// UFCS disabled for the namespace
func: () = {
@UFCS_YES {
// UFCS enabled in this scope.
}
}
}
- The same technique could be used to disable e.g. bounds checking in loops.
Thanks! Since even after the changes already made above UFCS still continues to be an issue, including that we now have more data UFCS is a compile-time performance issue, I'm trying out adding .. as a "really use only a member" opt-out. That leaves UFCS as the default, but provides a way to not use it when it's not wanted.
Note: If in the future we find that UFCS should not be the default, we can make . find members only, and .. be the UFCS syntax. That would be a breaking change, but a mechanical one.
Xref: #1101
With 9648191ecf0000696024bb54ecb8202373a922e9 we have the ability to use .. to select only a member (e.g., str..append("xyzzy");. And we also have the other improvements prompted by this issue, including removing some bugs and improving diagnostics. I don't like regions where the same code means different things based on language pragmas... so I'd like to avoid having x.f() sometimes be UFCS and sometimes not, and you have to look around in the context to know.
My impression is that the changes now done probably do address the motivation behind the request to have such regions, so I'll close this as completed. But feel free to reopen if you think there's a strong reason to reconsider more in this area, and even a lively debate about regions with different meanings for the same code!