cppfront icon indicating copy to clipboard operation
cppfront copied to clipboard

User-defined metafunctions based on dynamic library loading

Open DyXel opened this issue 1 year ago • 6 comments

This is a follow-up of #797, #907, #909 and all other related issues/discussions regarding metafunctions I might have missed. It is not necessary to have all of the previous context to follow this PR, although it would help a lot.

Special thanks to @JohelEGP for the original idea and implementation (see references above), @MaxSagebaum for #809 which is included in this PR and @edo9300 for helping with the DLL loading implementation.

Introduction

A metafunction is a special kind of function which is invoked on a declaration's reflection, and participates in defining its meaning (more in the documentation). This PR implements a mechanism that allows users to define and apply these kind of functions, as opposed to being only writable within cppfront's code.

The basic idea behind this design is that the definition of a metafunction can be compiled separately to a dynamic library (.so on Unix-like, .dll on Windows), targeting the reflection API, which can then be loaded and used by cppfront when processing further code. The major advantage of doing it this way is that metafunctions are regular code, and thus are not limited to the usual C++ compile-time evaluation restrictions, this makes it possible to use third-party libraries or execute any arbitrary code (e.g.: Open and write files, call a remote service, etc.).

Of course there are caveats as well, one in particular is that you can't define a metafunction and use it in the same compilation step (e.g.: In the same file), as the code would have needed to be compiled by the C++ compiler before cppfront can load it.

Basic Usage

Here is an example with GCC as base C++ compiler, build systems can automate the tedious parts:

metafunctions.cpp2:

greeter: @meta (inout t: cpp2::meta::type_declaration) = {
  t.add_member($R"(say_hi: () = std::cout << "Hello, world!\nFrom (t.name())$\n";)");
}

main.cpp2

my_class: @::greeter type = { }
main: ()                  = my_class().say_hi();

Steps

  1. Compile cppfront with -rdynamic: g++ cppfront.cpp -std=c++20 -rdynamic -o cppfront

  2. Transpile metafunctions.cpp2: ./cppfront metafunctions.cpp2

  3. Compile metafunctions.cpp to a dynamic library named libmetafunctions.so:

g++ -std=c++20 -I../include/ -shared -fPIC -o libmetafunctions.so metafunctions.cpp
  1. Transpile main.cpp2: CPPFRONT_METAFUNCTION_LIBRARIES=./libmetafunctions.so ./cppfront main.cpp2

  2. Compile main.cpp: g++ -std=c++20 -I../include/ main.cpp

  3. Run the resulting binary (./a.out), which outputs:

Hello, world!
From my_class

Implementation Details

Features

The main ingredients are a set of orthogonal features that combine to provide the full application:

Reflection API to generate non-member declarations

append_declaration_to_translation_unit is added, which allows a metafunction to generate code outside of the declaration that it is being applied to. Useful for things like factory functions and more.

Support applying metafunctions to functions and the @meta metafunction

The ability to apply a metafunction to a function declaration is enabled, and a @meta metafunction is added to mark would-be metafunctions in order to perform automatic registration.

Reflection API to obtain a declaration's fully qualified name

fully_qualified_name() is added, which allows obtaining the unique and final identifier of any declaration node, which can be used to refer to a specific entity from any place in code.

Public reflection API header and the internal foreign interface metafunction

The declaration of the reflection API is entirely moved to a public place that can be used by everybody including the compiler itself, it can be thought of as a "synopsis", but it is compilable code.

An internal compiler metafunction is added @_internal_foreign_interface_pseudovalue (obnoxious name on purpose), which automatically wraps the aforementioned declarations of the public header into something that allows for the full definition of the API to live somewhere else outside the library (i.e.: Inside cppfront).

The compiler's internal reflection code is refactored to provide the implementation of the interfaces created by the processing of the public header, allowing complete decoupling while keeping value semantics for the API, which also means not having to modify existing metafunctions.

Dynamic library loading and look-up mechanism

A standalone header (source/dll.h) is added that permits loading dynamic libraries on demand. Additional code that ties the loading of the dynamic libraries with the registration of metafunctions, as well as looking up names when attempting to apply a metafunction is added.

Putting All Together

When @meta is applied, it does a few things:

  • Add the support header cpp2reflect_api.h.
  • Generates an anonymous entity which receives the fully qualified name of the function as a string and its address.

The generated entity is of type register_metafunction, which is declared within cpp2reflect_api.h, and its constructor will be called when the library is loaded. The implementation of the overloads for this constructor are provided within cppfront.

Before loading a library, cppfront sets itself up so when the implementations of register_metafunction are called, they dispatch to a internal object which does bookkeeping and ensures the metafunctions can be found and called when needed.

Once cppfront tries to apply a metafunction, it will create the internal context (compiler_services_base) and a reflection API object, by filling in the interfaces with its internal implementation (provided by the renamed _impl classes in reflect.h2). In essence, the object is made up of "vtables", and the function pointers are all provided by cppfront at runtime. Prepped with this, it can proceed to do name look-up as before, it first attempts to use the internal metafunctions (the ones with nice name), followed by looking up in the loaded libraries.

DyXel avatar Aug 25 '24 00:08 DyXel

To Do

  • [ ] Implement full name look-up for metafunctions (including interpreting using directives?)
  • [ ] Optimize "foreign interface" generated code
  • [ ] Allow using and factoring existing metafunctions (TBD if to move existing metafunctions to a separate header or if somehow load them via DLLs)
  • [ ] Test Windows
  • [ ] Do more user-friendly error reporting
  • [ ] Grep remaining TODO(DyXel)
  • [ ] Fix found bugs. Add tests?
  • [ ] Documentation?

DyXel avatar Aug 25 '24 01:08 DyXel

There's still plenty to do, but I figured I would open the PR now to start getting reviews, as the main ideas are implemented and the basic case works. Expect lots of rebases as I go (even tho they are extra annoying due to self-hosting), since I like having a clean history.

I apologize for taking so long for this, after all, I made https://github.com/hsutter/cppfront/pull/907#issuecomment-2004742987 ages ago by now. Going forward I should be able to contribute code more regularly, not just by participating in discussions.

DyXel avatar Aug 25 '24 01:08 DyXel

For this

including interpreting using directives?

I am not too sure how to proceed. I will probably not handle that for this PR.

DyXel avatar Aug 25 '24 01:08 DyXel

Regarding

TBD if to move existing metafunctions to a separate header or if somehow load them via DLLs

Both directions have their pros and cons.

First, for "move existing metafunctions to a separate header":

  • pro: Move lots of code from out of reflect.h2, which is already super big.
  • pro: Makes it impossible for metafunctions provided in-the-box to "cheat" by using internal compiler stuff.
  • pro: Adds separability of the ugly internal metafunctions (now just 1) and the rest.
  • pro: Is trivial to do. All that needs to be done is move the code to either at the bottom of include/cpp2reflect_api.h2, or include/cpp2reflect_library.h2 (my pick).
  • con: This duplicates the metafunction definitions, cppfront will have them by default, and there will be an extra copy for each dynamic library, increases compile time.

For "somehow load them via DLLs":

  • pro: No definition duplication, faster compile time.
  • pro: Less delta with current reflect.h2.
  • con: Harder to implement, we'd need at least an additional mechanism to export DLL symbols, as the libraries would only have their declarations, not the implementations.

If its up to me, I would go with the former, seems like a more natural choice; The "standard library" is just code, and would only ever be included if you are authoring metafunctions, and I don't even think that the additional compile time would be that much.

DyXel avatar Aug 25 '24 01:08 DyXel

Nice to see the update. I would have a closer look, but I am currently on vecation. Maybe I will have time at the beginning of September.

MaxSagebaum avatar Aug 25 '24 21:08 MaxSagebaum

Let me more broadly answer your questions Max, we still also need feedback from Herb of course:

I do not understand why we have to support in and inout calling. Wouldn't be inout calling enough. Is there a reason for this?

Technically, we don't need it. When you call a metafunction you always have a mutable reference of the declaration available, and in fact, you wouldn't be able to have in and inout overloads, since it would be ambiguous. However, like I mentioned in one of the comments, in is the default argument parameter, and there might be metafunctions that you author which just read the declaration and generate information from rather than mutating it (like the aforementioned @print), so to me it makes sense to support in, I call this more a "comfort and clarity feature".

At the compiler code level, we can't cast a in metafunction to inout, that would lead to UB afaik, so registering and invocation must happen with the original function signature.

Some places are quite hardcoded for type and function metafunctions. If we want to support metafunctions on other objects like namespaces, variable declarations, using declarations, etc. Would it not be better to generalize this now. I suggested an enum and tuple approach.

I wouldn't be against a composition-over-inheritance approach, but this does mean big changes for the API, and subsequently, to the existing metafunctions. If we decide to go ahead with big API changes, I would also suggest completely decoupling compiler_services from the rest of the declaration types, possibly having 2 arguments for the metafunction rather than one.

On that note. What if somebody wants to have a meta function for everything. He would need to implement all different interface function. How about a wildcard interface function?

I am not sure a "metafunction for everything" is the right approach, we don't want to directly compete with the upcoming reflection support... To be discussed further I guess.

Regarding "wildcard interface function", the declaration type from which all other declarations inherit provides a base that the users can extend.

DyXel avatar Sep 06 '24 12:09 DyXel

I wouldn't be against a composition-over-inheritance approach, but this does mean big changes for the API, and subsequently, to the existing metafunctions. If we decide to go ahead with big API changes, I would also suggest completely decoupling compiler_services from the rest of the declaration types, possibly having 2 arguments for the metafunction rather than one.

Why would this be a big API change? We would only change the lookup logic. The interface of the metafunctions would not be changed.

Regarding "wildcard interface function", the declaration type from which all other declarations inherit provides a base that the users can extend.

This would then require pointer or reference semantics, I think.

MaxSagebaum avatar Sep 10 '24 06:09 MaxSagebaum

Thanks! As mentioned in #1287, I'm very interested in this kind of thing, but won't have the bandwidth to look at it for a while. So for now I'll close this, but do feel free to keep trying out the idea in a clone of the repo!

Thanks again for understanding.

hsutter avatar Sep 24 '24 03:09 hsutter