cppfront
cppfront copied to clipboard
[SUGGESTION] Implement the pipeline operator from P2011 & P2672
Suggestion
Cpp2 could support the pipeline operator |>
as proposed in P2011 and further explored in P2672.
Specifically, the pipeline operator with the "placeholder model" with mandatory placeholder (e.g. $
), as described in P2672 section 6 Disposition
. Both papers explain the problem and motivation for the new operator, as well as discussing options for the placeholder token.
Circle uses $
as its token.
The proposed operator enables a simpler left-to-right style as opposed to an inside-out style.
Conor Hoekstra (code_report) has various talks about ranges and pipelines and explains how the pipeline operator can make the code simpler and more readable. The following is one of his examples:
Without the operator:
auto filter_out_html_tags(std::string_view sv) {
auto angle_bracket_mask =
sv | rv::transform([](auto e) { return e == '<' or e == '>'; });
return rv::zip(rv::zip_with(std::logical_or{},
angle_bracket_mask,
angle_bracket_mask | rv::partial_sum(std::not_equal_to{})), sv)
| rv::filter([](auto t) { return not std::get<0>(t); })
| rv::transform([](auto t) { return std::get<1>(t); })
| ranges::to<std::string>;
}
With the operator:
auto filter_out_html_tags(std::string_view sv) {
return sv
|> transform($, [](auto e) { return e == '<' or e == '>'; })
|> zip_transform(std::logical_or{}, $, scan_left($, true, std::not_equal_to{}))
|> zip($, sv)
|> filter($, [](auto t) { return not std::get<0>(t); })
|> values($)
|> ranges::to<std::string>($);
}
Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code? No
Will your feature suggestion automate or eliminate X% of current C++ guidance literature? No
Describe alternatives you've considered. Alternatives are discussed at length in the two papers referenced above.
~~The talk: "New Algorithms in C++23 - Conor Hoekstra - C++ on Sea 2023".~~ The recommended version: "Guide to the New Algorithms in C++23 - Conor Hoekstra - CppNorth 2023".
The proposed operator enables a simpler left-to-right style as opposed to an inside-out style.
As soon as I see this, I think "hmm, like UFCS does?", and then I saw the example which looks suspiciously like UFCS for most of the cases.
Clarifying questions, just so I understand the question (I haven't had time to watch the talk).
Is the $
is the placeholder for where to put the left-hand side of the operator, so that this code (ignoring for now the zip_transform one where it's not in the first location):
return sv
|> transform($, [](auto e) { return e == '<' or e == '>'; })
|> zip($, sv)
|> filter($, [](auto t) { return not std::get<0>(t); })
|> values($)
|> ranges::to<std::string>($);
would be the same as this using UFCS, which I think would work in Cpp2 now:
return sv
.transform(:(e) e == '<' || e == '>';)
.zip(sv)
.filter(:(t) !std::get<0>(t);)
.values()
.ranges::to<std::string>();
or something like that, modulo any late-night typos I wrote?
That's right.
Including zip_transform
, I have confirmed that this translates correctly (https://cpp2.godbolt.org/z/7Wdbf5o1Y, https://compiler-explorer.com/z/hcx9ex4j4 [formatted]):
#include <algorithm>
#include <ranges>
using namespace std::views;
// auto filter_out_html_tags(std::string_view sv) {
// return sv
// |> transform($, [](auto e) { return e == '<' or e == '>'; })
// |> zip_transform(std::logical_or{}, $, scan_left($, true, std::not_equal_to{}))
// |> zip($, sv)
// |> filter($, [](auto t) { return not std::get<0>(t); })
// |> values($)
// |> ranges::to<std::string>($);
// }
filter_out_html_tags_cpp2: (sv: std::string_view) -> _ = {
(a := sv
.transform(:(e) e == '<' || e == '>';))
return zip_transform(std::logical_or(), a, scan_left(a, true, std::not_equal_to()))
.zip(sv)
.filter(:(t) !std::get<0>(t);)
.values()
.to<std::string>();
}
main: () = { }
Can you remind me where scan_left
comes from?
Can you remind me where scan_left comes from?
scan_left
is from code_report's example.
auto scan_left(auto rng, auto init, auto op) {
return transform(rng, [first = true, acc = init, op](auto e) mutable {
if (first) first = false;
else acc = op(acc, e);
return acc;
});
}
Wow, that was fast. I was just trying to write the code for zip_transform
a different way, but I see a statement parameter worked.
What about expressing this
|> zip_transform(std::logical_or{}, $, scan_left($, true, std::not_equal_to{}))
as this
. :(x) zip_transform(std::logical_or(), x, scan_left(x, true, std::not_equal_to())) ()
?
Then it's all .
?
(I haven't tried to compile the code though.)
There's some overlap between UFCS and this proposed pipeline operator.
Conor's presentation shows off various range pipeline examples, but the P2011 and P2672 papers discuss the motivation and problem space.
P2011 also discusses how it's different to UFCS. I think the key difference for Cpp2 is allowing the placeholder to appear in different argument positions in order to compose the range algorithms and views.
Do you have a test? This compiles: https://cpp2.godbolt.org/z/n8673xofn.
I had to add to
, because Libstdc++ doesn't have ranges::to
.
Also, Libc++ doesn't implement zip_transform
.
(https://en.cppreference.com/w/cpp/compiler_support).
By the way, I just took the parameter of to
by in
.
The error message was hideous.
I compiled locally with #506 (and some other things)
and quickly found out
x.cpp2:4:104: error: no match for call to ‘(to<std::__cxx11::basic_string<char>, std::ranges::elements_view<…
(…
inserted by me).
These are the sizes of the error outputs:
$ ls out-* -lh
-rw-r--r-- 1 johel johel 60K Oct 9 21:28 out-main
-rw-r--r-- 1 johel johel 23K Oct 9 21:27 out-waarudo
Do you have a test?
I added one. It seems to output characters rather than strings. https://cpp2.godbolt.org/z/s5crPezaj.
. :(x) zip_transform(std::logical_or(), x, scan_left(x, true, std::not_equal_to())) ()
That's not valid grammar (https://cpp2.godbolt.org/z/9avjYh6sb):
main.cpp2...
main.cpp2(25,10): error: '.' must be followed by a valid member name (at '(')
Yes -- in a racing update I was updating the comment to say the following, but I'll make it a separate reply instead:
Right, that code is currently rejected because .
must be followed by a name.
So perhaps have a general helper like call:(f) :(x) f(x);
to enable writing
.call(:(x) zip_transform(std::logical_or(), x, scan_left(x, true, std::not_equal_to()));)
modulo typos and bugs? Or maybe name it curry
? Anyway, signing off for tonight, but a very interesting question! Thanks.
You're right that UFCS is very close already, and arguably the dot syntax is just as nice as the new operator, but with UFCS I'd still prefer a token to make the syntax simpler. (Perhaps _
is better than Circle's choice of $
since there's already precedent in Cpp2.)
.zip_transform(std::logical_or{}, _, scan_left(_, true, std::not_equal_to{}))
Excellent.
It works again with call:(forward o, forward f) f(o);
(https://cpp2.godbolt.org/z/evj8PnzE7).
Circle:
auto filter_out_html_tags(std::string_view sv) {
return sv
|> transform($, [](auto e) { return e == '<' or e == '>'; })
|> zip_transform(std::logical_or{}, $, scan_left($, true, std::not_equal_to{}))
|> zip($, sv)
|> filter($, [](auto t) { return not std::get<0>(t); })
|> values($)
|> ranges::to<std::string>($);
}
Cpp2 (colored):
Text
filter_out_html_tags_cpp2: (sv: std::string_view) //
sv.transform(:(e) e == '<' || e == '>')
.call(:(x) zip_transform(std::logical_or(), x, scan_left(x, true, std::not_equal_to())))
.zip(sv)
.filter(:(t) !t.get<0>())
.values()
.to<std::string>();
IIRC, that proposal has seen push back due to having to specify the semantics of pipe arguments as non-first argument and of multiple pipe arguments in the same function call.
Since there's talk about P2672, is there any interest for placeholder lambdas to replace the recently added :(x) x
syntax?
You're right that UFCS is very close already, and arguably the dot syntax is just as nice as the new operator, but with UFCS I'd still prefer a token to make the syntax simpler. (Perhaps
_
is better than Circle's choice of$
since there's already precedent in Cpp2.)
.zip_transform(std::logical_or{}, _, scan_left(_, true, std::not_equal_to{}))
Standalone $
, specially as an argument, has no meaning in Cpp2.
It could mean "capture the object argument here" in a function call expression.
And so the default becomes today's "capture the object argument as the first argument".
"Object argument" means the expression before the .
.
So when you write x.f(args)
,
you get the default for x.f($, args)
.
UFCS is then defined to apply when the first argument is $
.
We still have the problem of having to specify the behavior when
the object argument appears more than once or
it appears as more than a simple $
, e.g., $.first
.
Anyways, I think a simple $
argument plays well with
https://github.com/hsutter/cppfront/wiki/Design-note%3A-Capture and
https://github.com/hsutter/cppfront/wiki/Design-note%3A-Defaults-are-one-way-to-say-the-same-thing.
Do you have a test?
I added one. It seems to output characters rather than strings. https://cpp2.godbolt.org/z/s5crPezaj.
Looks like it works on characters, indeed. Using the fixed implementation from https://github.com/hsutter/cppfront/issues/741#issuecomment-1754194152, https://cpp2.godbolt.org/z/qjvonb8s3, it prints the same as the CE link from the talk at codereport/Content: https://godbolt.org/z/on5xMG5ax.
@codereport FYI.
I wish I could use the terse function syntax, but
main.cpp2: error: unexpected end of source file
.
Thanks! It's rare these days that I find a bug in the very first "load" step that tags which code is Cpp1 vs Cpp2, but this was one. I think it's fixed in this commit: 789cd382ed4c2fb1a9e306e73b6876228d22207d
is there any interest for placeholder lambdas to replace the recently added
:(x) x
syntax?
Do you mean like Boost.Lambda's _1 + f()
? If so...
My concern with that is, would it be:
-
Would it be a special feature that works only in anonymous function bodies?
-
Would it be allowing a second way to say the same thing (not just defaulting) -- a competing syntax to teach, and one that meets overlapping needs so we would have to teach which to use when? Whereas the current syntax for lambdas is still a single syntax with optional parts you can omit when you're not using them.
I could be persuaded to like _1
-style placeholders, though, if these two things could be addressed:
-
If they could serve a general purpose in the language beyond anonymous function placeholder parameters, just like
$
for capture works for "capture value" semantics everywhere (not just in anonymous function captures, but also postconditions and string interpolation). -
If that general use were allowed in ordinary named functions in a way that still naturally lets us omit unused parts of the general function syntax to get down to anonymous functions, so we still have a single function syntax.
Does that make sense?
I don't seem to have a C++ compiler installed on this machine that supports all of the new range/view things used in this example, because I'm mainly testing with a-few-years-old compilers to ensure compatibility.
But if I understand correctly, the original example of this:
auto filter_out_html_tags(std::string_view sv) {
auto angle_bracket_mask =
sv | rv::transform([](auto e) { return e == '<' or e == '>'; });
return rv::zip(rv::zip_with(std::logical_or{},
angle_bracket_mask,
angle_bracket_mask | rv::partial_sum(std::not_equal_to{})), sv)
| rv::filter([](auto t) { return not std::get<0>(t); })
| rv::transform([](auto t) { return std::get<1>(t); })
| ranges::to<std::string>;
}
which could be written more simply using the proposed |>
operator like this:
auto filter_out_html_tags(std::string_view sv) {
return sv
|> transform($, [](auto e) { return e == '<' or e == '>'; })
|> zip_transform(std::logical_or{}, $, scan_left($, true, std::not_equal_to{}))
|> zip($, sv)
|> filter($, [](auto t) { return not std::get<0>(t); })
|> values($)
|> ranges::to<std::string>($);
}
works in Cpp2/cppfront today using just UFCS like this (with a helper call:(forward o, forward f) f(o);
):
filter_out_html_tags_cpp2: (sv: std::string_view) -> _ = {
return sv
.transform(:(e) e == '<' || e == '>';)
.call(:(x) zip_transform(std::logical_or(), x, scan_left(x, true, std::not_equal_to()));)
.zip(sv)
.filter(:(t) !t.get<0>();)
.values()
.to<std::string>();
}
... Is that correct?
That's right.
I wish I could use the terse function syntax, but
main.cpp2: error: unexpected end of source file
.Thanks! It's rare these days that I find a bug in the very first "load" step that tags which code is Cpp1 vs Cpp2, but this was one. I think it's fixed in this commit: 789cd38
Yes, this works now.
filter_out_html_tags_cpp2: (sv: std::string_view) //
sv.transform(:(e) e == '<' || e == '>';)
.call(:(x) zip_transform(std::logical_or(), x, scan_left(x, true, std::not_equal_to()));)
.zip(sv)
.filter(:(t) !t.get<0>();)
.values()
.to<std::string>();
[Edited to add that this also helps reduce need for library techniques like overloading |
]
Groovy, thanks.
I've learned two major things from this thread:
- UFCS is even more useful than I thought. It isn't just good for enabling generic code (to be able to call functions whether they're members or nonmembers, which today we can only do with operators) and good for enabling IDEs (autocomplete), but it can help reduce the pressure to add special-purpose language features like
|>
and reduce pressure to use special-purpose library techniques like overloading|
. - The brand-new very terse syntax has surprised me with how immediately useful it has been, including in this thread. It's a smallish change that really feels game-changing, because it shortens just enough of the ceremony of declaring the anonymous function so that it goes from feeling like "an ordinary function that has this expression body" to "an ordinary expression that we use as a function just by prefixing
:(x)
" (both are valid, but the latter feels powerful to me somehow).
Opened #746 for this.
Something I've mentioned before is that UFCS on a qualified name doesn't work with GCC: https://compiler-explorer.com/z/qb19TEGv1.
And Cpp2 doesn't have using
declarations: #559.
So we'd have to put using ranges::to
at global scope, outside any namespace, possibly far from its use.
I can do that now that I realized that the ranges::to
in the code was range-v3's and not std
's.
But maybe it's just a GCC bug.
I wish I could use the terse function syntax, but
main.cpp2: error: unexpected end of source file
.Thanks! It's rare these days that I find a bug in the very first "load" step that tags which code is Cpp1 vs Cpp2, but this was one. I think it's fixed in this commit: 789cd38
is there any interest for placeholder lambdas to replace the recently added
:(x) x
syntax?Do you mean like Boost.Lambda's
_1 + f()
? If so...My concern with that is, would it be:
- Would it be a special feature that works only in anonymous function bodies?
- Would it be allowing a second way to say the same thing (not just defaulting) -- a competing syntax to teach, and one that meets overlapping needs so we would have to teach which to use when? Whereas the current syntax for lambdas is still a single syntax with optional parts you can omit when you're not using them.
I could be persuaded to like
_1
-style placeholders, though, if these two things could be addressed:
- If they could serve a general purpose in the language beyond anonymous function placeholder parameters, just like
$
for capture works for "capture value" semantics everywhere (not just in anonymous function captures, but also postconditions and string interpolation).- If that general use were allowed in ordinary named functions in a way that still naturally lets us omit unused parts of the general function syntax to get down to anonymous functions, so we still have a single function syntax.
Does that make sense?
Well, my thinking was that the current new syntax was already kind of divergent. Yes, it is obtained by omitting parts but that ommitance is conflicting (omitting -> Type
means void return type while omitting -> _ =
means deduced return type.) Furthermore, this syntax would make it's way to full (named) function declarations where it's desirable for full parts to be present. That conflicts with your 2nd point but that's how I see it. Most languages which have a short lambda syntax do not allow omitting from full functions.
As for the more general presence of placeholders, if the pipeline operator is present, that'd be one place for them in the language but I got nothing on this front. Maybe some places in the language can be tweaked for this (like its done with string interpolation, it easily could've been done with format library but $
was used because it was present more generally in cpp2).
It'll not be competing because it'd only work for anonymous functions, maybe there's no need to start the with :
.
Well, my thinking was that the current new syntax was already kind of divergent. Yes, it is obtained by omitting parts but that ommitance is conflicting (omitting
-> Type
means void return type while omitting-> _ =
means deduced return type.)
That's because when you're using the terse syntax,
the default you're writing for is -> _ =
.
See https://github.com/hsutter/cppfront/wiki/Design-note%3A-Defaults-are-one-way-to-say-the-same-thing.
To me, allowing a generic function
f:(i:_) -> _ = { return i+1; }
to be spelledf:(i) i+1;
is like that... there's only one way to spell it, but you get to omit parts where you're happy with the defaults.
You may be interested in reading
WG21 Number | Title | Author |
---|---|---|
P3021 | Unified function call syntax (UFCS) | Herb Sutter |
which mentions
This paper was motivated by [cppfront #741]
After reading the paper above, I once again thought of this:
. :(x) zip_transform(std::logical_or(), x, scan_left(x, true, std::not_equal_to())) ()
That's not valid grammar (https://cpp2.godbolt.org/z/9avjYh6sb):
main.cpp2... main.cpp2(25,10): error: '.' must be followed by a valid member name (at '(')
We have UFCS with semantics obj.func()
, that if not valid, is rewritten as func(obj)
.
So now it makes sense for func
to also be a function expression.
Without UFCS, obj.:(x) x;()
makes no sense (or obj.[](auto x) { return x; }()
in Cpp1).
But with UFCS, that has a meaning, except that it's not valid grammar.
I find it very interesting that it takes only a single sentence in standardese to enable UFCS for x.f(...)
to call f(x, ...)
.
If E2 is not found as a postfix function call expression using the dot operator, and E1.E2 is followed by a function argument list (args), treat the postfix expression E1.E2(args) as a postfix function call expression E2(E1,args).
. :(x) zip_transform(std::logical_or(), x, scan_left(x, true, std::not_equal_to())) ()
That's not valid grammar (https://cpp2.godbolt.org/z/9avjYh6sb):
Right, I like the call
helper well enough that I'm waiting to see if there's really a need to write a new function expression in the middle of a postfix-expression... it's doable but is it needed?
I find it very interesting that it takes only a single sentence in standardese to enable UFCS for
x.f(...)
to callf(x, ...)
.
Once you have both things specified, it's actually fairly easy in the first thing's specification to turn "else it's ill-formed" into "else try [second existing thing]"... great example of reuse. Same thing in cppfront, when all I have to do for a new feature is enable also looking at another existing thing (e.g., grammar production) it tends to be a few-line tactical change rather than a big surgery.
That said, note that is only my own draft standardese wording, I still need a card-carrying Core language working group expert to check it 😏 . That said, it is based on language that ware Core-reviewed when a variation of this proposal last made it to plenary in 2016, just it was going the other (IMO wrong) way, having f(x)
fall back to x.f()
.
With regards to https://www.reddit.com/r/cpp/comments/17h7trm/p3027r0_ufcs_is_a_breaking_change_of_the/,
how about changing the UFCS syntax to from using .
to using .:
?
E.g., obj.:func(args)
.
The :
in .:
comes from the ::
s in the (implicit) name of a chosen non-member func
(e.g., could be ::func
, ns::func
when within ns
, or the one in obj.:base::func(args)
).
I think the ADL woes, regardless of UFCS, are an orthogonal issue, best solved separately.