cppfront [SUGGESTION] Implement local functions (as CPP1 lambda-s)

Background:

This request stems from #1234. I am not sure if #714 addresses this too but since @hsutter said he would like to do it, I assume this is separate and file this just to keep track of the matter.

Current situation:

Currently it is required to declare local functions (which include captures) as:

func := :() = { std::cout << "Price = " << price$ << "\n"; };

And this is rewritten into the following CPP1:

auto func {[_0 = price]() mutable -> void{
    std::cout << "Price = " << _0 << "\n";
}};

Current difficulty:

However, :=:()= is quite abstruse to decipher, which is why I had to ask #1234. IIANM CPP2 wants to cut out such abstruseness in the language as far as possible.

So I tried to just write a local function, but:

func : () = std::cout << "Price = " << price$ << "\n";

gives:

test4b.cpp2(4,41): error: $ (capture) cannot appear here - it must appear in an anonymous expression function, a postcondition, or an interpolated string literal (at '$')

and even without the capture:

greet : () = std::cout << "Hello\n";

gives:

test4c.cpp2(2,5): error: (temporary alpha limitation) local functions like 'greet: (/*params*/) = {/*body*/}' are not currently supported - write a local variable initialized with an unnamed function like 'greet := :(/*params*/) = {/*body*/};' instead (add '=' and ';')

Request:

I am not an expert but I am not sure there is any effective difference between a local function and a named lambda.

Given that CPP2 already rewrites code to overcome CPP1 limitations, I hence feel it should just automatically rewrite the local function as a named lambda. Hence the recommendation to “add = and ;” (which seems to be missing the :) should no longer be made and CppFront should silently do this rewrite for us instead.

This would also adhere to the one declaration syntax principle, rather than having the user to write :=:()= io the normal :()= for effectively the same purpose.

IIANM this would mean that the final ; should also not be there, as it is there only because this is currently treated as a lambda variable definition rather than a function definition.

Aug 19 '24 11:08 jamadagni

I agree with the baby step of simply lowering a regular function to a lambda in C++ so we can write things close to where they are used, it helps a lot with refactoring and evolving the code, something that comes up naturally while you design a solution to your problem.

Word of caution though: Supporting writing functions like this might seem controversial to others. At work, where we use Python mostly today, I have heard that inline functions (not lambda) are the devil and should have never been supported, the perceived problem, which to be honest I have seen the worst of in a legacy project so I can understand where they come from, is that they encourage inlining all the logic within a single massive function, hurting readability, and also encourage duplication of logic (since you can't access a function within another function from anywhere else). In my experience, this is a organizational/architectural problem, more than a tooling, language or developer issue.

On the other hand, I know functional programming folks like to write functions within another function since this naturally provides encapsulation, I can appreciate that, but to be honest never thought about it too hard since C++ provides more explicit mechanisms for this, as I said before, I want to look at it from the perspective of consistency and simplicity, and disallowing writing regular functions seems like something artificial rather than a legitimate technical issue.

Finally, I have the feeling that allowing captures in regular functions should also maybe be pursued, a combination of a @pure metafunction (or equivalent) that enforces only accessing explicitly named arguments and/or captures should help with refactoring, catching hard to track bugs in multi-threaded code, and allow to more easily reason about what you wrote.

Aug 19 '24 13:08 DyXel

Thanks! Very good thoughtful notes here.

Very brief ack:

I think there's evidence local functions are useful, for example to organize common logic used inside the same function that's not applicable (yet) elsewhere so that logic should be scoped inside the function. See cppfront's iteration statement parsing for example, but there are lots of places in cppfront that use this (grepping for auto\w*[_a-zA-Z][_a-zA-Z0-9]*\w*=\w*[ should find them?).
"But functions should be short!!" is the next thing people will say, and that's often true but not always. (Just look at cppfront and you'll see that I think long functions can be appropriate, even if you don't agree. YMMV.)
Capture is inherent to local functions exclusively. Only local variables (including this) can be captured, so only local functions can do any capture by definition. (Any function, global or local, can use a global variable, but not by capturing it -- just by using its name. I would need evidence to convince me that capturing a copy of the current state of a global would make sense at any time, never mind the concurrency implications of doing that.)

Aug 21 '24 02:08 hsutter

I would need evidence to convince me that capturing the current state of a global would many any sense at any time, never mind the concurrency implications of doing that.)

I don't really have any as there isn't much out there unfortunately 😅. Its just an itch, but it might be worth experimenting on it and see if it leads anywhere.

Aug 21 '24 08:08 DyXel

I have been thinking about this issue again and was wondering the following:

Would it be possible and/or desirable, to support declaring order-independently types and functions (not lambdas, since they are objects) within the body of a function? That is, apply the same rules which already apply to "global" types and functions, and which already apply to things within a type definition, to function bodies.

My reasoning is that this would make the overall syntax more general and less "constrained", but that these "constrains" should be applied in a orthogonal fashion, for example, with a linter that forbids deeply nesting definitions. Importantly, it allows in more cases defining your program's behavior in a procedural manner from top to bottom.

For example, given this program that already works today:

main: () = std::cout << greet("World");

greet: (subject: std::string_view) -> std::string
   = "Hello, (subject)$!\n";

It want to keep greet encapsulated within main, since it only makes sense within its context:

main: () = {
    std::cout << greet("World");

    greet: (subject) -> std::string = "Hello, (subject)$!\n";
}

This obviously doesn't work today, but an equivalent C++ program written like this, would work:

#include "cpp2util.h"
auto main() -> int {
    class __anon
    {
    public: static auto entry() { std::cout << greet("World"); }

    private: static auto greet(std::string_view subject) -> std::string {
        return "Hello, " + cpp2::to_string(subject) + "!\n";
    }
    };
    __anon::entry();
}

Does this make sense?

[!NOTE] While writing this toy example, clang gave me templates cannot be declared inside of a local class, which means even if this is supported it would not work with deduced arguments type as of yet, so most likely the implementation would need to do some hidden out-of-scope implementations, similar to what's done for multiple return types.

Sep 20 '24 16:09 DyXel

Quick answer: I think function bodies should stay order-dependent. The main benefits of order-independence is so that decoupled entities (possibly with different authors/maintainers) can be written more conveniently. A function body is a tightly coupled unit with one author/maintainer at a time.

Sep 23 '24 02:09 hsutter