meson icon indicating copy to clipboard operation
meson copied to clipboard

C++20 modules are in: discussing a sane (experimental) design for Meson

Open germandiagogomez opened this issue 5 years ago • 120 comments

Hello everyone.

I am particularly interested in this topic. CMkae guys already started something:https://www.reddit.com/r/cpp/comments/axnwiz/cmake_gcc_module_proofofconcept/

Though I do not know the proposal well enough to propose a particular design myself, I think it would be a good idea to kick off discussion on strategies and module mapping, at least in the context of gcc, now that Modules have been voted in.

I would propose that this thread output is an initial design proposal to give a first try on implementation with the main high level details and strategies:

  • file mapping handling
  • module scanning
  • ninja (and others later) rules to generate...

germandiagogomez avatar Mar 06 '19 05:03 germandiagogomez

I'm not sure I have a good enough grasp on how C++20 modules are supposed to work, does anyone have a link to a good overview of them?

dcbaker avatar Mar 06 '19 17:03 dcbaker

There isn't one. There are three different implementation that are different. The standardisation committee has promised to create a technical specification on how this "should work" but no-one has to actually follow that, though they are strongly recommended to.

jpakkane avatar Mar 06 '19 20:03 jpakkane

Sigh, I love committee hand waving. Does GCC or Clang have any documents on how their implementation is supposed to work?

I'm dreading trying to get this information out of the ICC guys.

dcbaker avatar Mar 07 '19 18:03 dcbaker

Food for thought about how to organize things, this will be a series I guess: https://vector-of-bool.github.io/2019/03/10/modules-1.html

germandiagogomez avatar Mar 11 '19 10:03 germandiagogomez

First attemp at module mapping: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1484r1.pdf

GNU Make modules support prototype: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1602r0.pdf

@jpakkane this seems to be the start of some reasonable mapping and a proof of concept implementation of scanning dependencies?

germandiagogomez avatar Mar 20 '19 21:03 germandiagogomez

Hi,

some links about modules in GCC and Clang: https://gcc.gnu.org/wiki/cxx-modules https://clang.llvm.org/docs/Modules.html

This will also affect e.g., ninja: https://github.com/ninja-build/ninja/pull/1521

It might be worth keeping in mind that e.g., Fortran has had modules (and submodules) for a while. CMake handles those quite well (ninja still needs some patches, I think).

andreaskem avatar Mar 23 '19 08:03 andreaskem

Unless the ninja patches go upstream I don't think we can rely on ninja for this. We'll also have to figure out what XCode and VS are going to do about modules, I don't know if it's better to rely on ninja doing it for us if VS or XCode makes us implement this ourselves. (It doesn't seem unlikely that msbuild will rely on VS explicitly declaring the module relationship in the XML.

Meson already handles fortran modules and submodules I think, without using anything from ninja. @scivision has done a lot of work in that area.

dcbaker avatar Mar 27 '19 17:03 dcbaker

Unless the ninja patches go upstream I don't think we can rely on ninja for this.

FYI: It's very very likely that those Ninja patches will go upstream soon (I'm planning to review the PR this month) and be released with 1.10.

jhasse avatar Apr 01 '19 11:04 jhasse

Hi, I'm the author of the Reddit post linked in the description and the CMake C++20 module support, so I can answer any questions you might have. I have a repository with C++ module use cases that I'd be happy to have meson build support added to.

Meson already handles fortran modules and submodules

Does it support generated fortran sources though? That's basically what requires the ninja patches we made. I'll be adding a C++ example for this next week to the above mentioned repository.

mathstuf avatar Apr 05 '19 20:04 mathstuf

There isn't one. There are three different implementation that are different. The standardisation committee has promised to create a technical specification on how this "should work" but no-one has to actually follow that, though they are strongly recommended to.

I've gotten verbal confirmation that the major compiler vendors would be fine with providing information specified in this thread (continued here and here). I have the patch for GCC. Clang is next on my list, and it sounds like MSVC will as well. EDG said they'd have to follow whatever the other three do anyways, so we get them and all their backends by their other-compiler emulation. If you have any input on the format specified there, feel free to drop me a line.

mathstuf avatar Apr 05 '19 20:04 mathstuf

any progress ? does the meson team know when we should we expect this feature to be released ? i would like to try experimental support as soon as it is available

GunpowderGuy avatar May 11 '19 22:05 GunpowderGuy

Just to add another CMake perspective, with details of how CMakes implement Fortran modules and submodules and thoughts how they'd do C++ https://mathstuf.fedorapeople.org/fortran-modules/fortran-modules.html

scivision avatar Jun 20 '19 15:06 scivision

I'll also note that the required Ninja features have landed and will be included in the 1.10 release.

mathstuf avatar Jun 21 '19 17:06 mathstuf

Here are my thoughts on supporting C++20 modules in Meson:

The main problem that has to be solved with modules is the mapping of module names/identifiers to source files and the resolution of dependencies. There are multiple proposals on how such resolution mechanisms can be implemented. They range from an on-demand compiler, build system IPC mechanism and batch compilation to scanning the source files during the build step.

The main problem I see with these proposals is that they all (some more than others) depend on additional functionality in the compiler. In an ideal world, the required feature would be implemented by every compiler with the same logic, same limitations, same JSON format, etc. However, this might not necessarily be the case. As a result, code that compiles perfectly fine with GCC and Clang might fail to compile with ICC. In the (granted unlikely) worst-case scenario, a module handler for every compiler has to be written and maintained.

Even if there is an interface that is supported by most compilers, a fallback solution is still required for the remaining compilers. This would also leave us with two module mechanisms to maintain. Additionally, there would also be no guarantee that code that works with the "official" mechanism would also work with the "fallback" mechanism.

My solution: Let the build system do the entire dependency management by only providing the "fallback" solution. At least for the initial support until there is a universally accepted solution.

Since depending on the compiler in any way doesn't work, both batch compilation and IPC are out. This leaves us with scanning the source code with a builtin/shared tool. I am not going to repeat what is already listed there. I want to argue why I think that this scanner shouldn't be implemented in the compiler.

Naturally, this scanning step can not cover all possible macro/module interactions. If such an import statement is encountered where it does not know what to do (unknown macro in #if, etc.) an error is produced, and the build is aborted.

This artificially restricts what you can do with C++20 modules (module lookup is implementation-defined anyway, so this should be fine). While this restricts what developers can do initially, it would greatly increase the portabilety of the code since every compiler that supports modules should support explicit module mapping.

That being said, there is already a relatively small preprocessor implementation in python, writing one for the scanning tool shouldn't be that hard. So something like this could be supported out of the box:

#define FOO fooMod
#define BAR(x) FOO ## : ## x
export module FOO;
import BAR(bar);

And expanding the subset of module declarations that are supported later on is always possible.

The only real problem is dealing with the automatic #include conversion by the compiler. However, it should be possible to work around this by having a whitelist of compilers where the rules for automatic #include conversion can be retrieved. For all other compilers, this feature is disabled.

Another advantage would be that we have full control over the scanning tool and can remove and add features as we please. With a direct compiler integration, we would have to ask all the compiler vendors to change something.

Of course, writing such a scanner wouldn't be trivial to build and would ideally be used by multiple build systems (maybe some Meson CMake cooperation?). I would be personally happy to contribute some code towards this if there is a chance that this approach would be used in Meson.

These are my thoughts on this issue, so please correct me if I got anything wrong or missed something.

mensinda avatar Aug 25 '19 21:08 mensinda

On August 25, 2019 5:32:19 PM EDT, Daniel Mensinger [email protected] wrote:

Here are my thoughts on supporting C++20 modules in Meson:

The main problem that has to be solved with modules is the mapping of module names/identifiers to source files and the resolution of dependencies. There are multiple proposals on how such resolution mechanisms can be implemented. They range from an on-demand compiler, build system IPC mechanism and batch compilation to scanning the source files during the build step.

I'm the author of the latter.

The main problem I see with these proposals is that they all (some more than others) depend on additional functionality in the compiler. In an ideal world, the required feature would be implemented by every compiler with the same logic, same limitations, same JSON format, etc. However, this might not necessarily be the case. As a result, code that compiles perfectly fine with GCC and Clang might fail to compile with ICC. In the (granted unlikely) worst-case scenario, a module handler for every compiler has to be written and maintained.

I have already gotten verbal confirmation that the JSON format is ok with GCC (I have a patch), Clang developers, and MSVC. When I asked EDG developers, they said they have to implement whatever the other three do anyways. I don't think we'll have issues with compiler support (it's really easy anyways compared to having modules at all). Flag spelling may differ, but GCC and Clang will likely be the same.

We at Kitware also have contacts within the Fortran community and may be able to share the format with those compilers too through its standardization process.

Even if there is an interface that is supported by most compilers, a fallback solution is still required for the remaining compilers. This would also leave us with two module mechanisms to maintain. Additionally, there would also be no guarantee that code that works with the "official" mechanism would also work with the "fallback" mechanism.

My solution: Let the build system do the entire dependency management by only providing the "fallback" solution. At least for the initial support until there is a universally accepted solution.

Since depending on the compiler in any way doesn't work, both batch compilation and IPC are out. This leaves us with scanning the source code with a builtin/shared tool. I am not going to repeat what is already listed there. I want to argue why I think that this scanner shouldn't be implemented in the compiler.

See clang-scan-deps (on the phone, can get a link tomorrow) for such a tool. Emulating the compilers is hard though, so I think that is probably best at first. Such a tool would be useful for faster scanning if it proves to be too slow.

Naturally, this scanning step can not cover all possible macro/module interactions. If such an import statement is encountered where it does not know what to do (unknown macro in #if, etc.) an error is produced, and the build is aborted.

This artificially restricts what you can do with C++20 modules (module lookup is implementation-defined anyway, so this should be fine). While this restricts what developers can do initially, it would greatly increase the portabilety of the code since every compiler that supports modules should support explicit module mapping.

That being said, there is already a relatively small preprocessor implementation in python, writing one for the scanning tool shouldn't be that hard. So something like this could be supported out of the box:

#define FOO fooMod
#define BAR(x) FOO ## : ## x
export module FOO;
import BAR(bar);

This is not even the surface of the corner cases :) . Please join #modules and #sg15_tooling on the cpplang slack to discuss with other stakeholders.

And expanding the subset of module declarations that are supported later on is always possible.

The only real problem is dealing with the automatic #include conversion by the compiler. However, it should be possible to work around this by having a whitelist of compilers where the rules for automatic #include conversion can be retrieved. For all other compilers, this feature is disabled.

I'll expand on my thoughts here more tomorrow. Short answer: no conversion at all unless told so by the build system (because it needs to know).

Another advantage would be that we have full control over the scanning tool and can remove and add features as we please. With a direct compiler integration, we would have to ask all the compiler vendors to change something.

That's why we're working with ISO where all the implementors are present :) .

Of course, writing such a scanner wouldn't be trivial to build and would ideally be used by multiple build systems (maybe some Meson CMake cooperation?). I would be personally happy to contribute some code towards this if there is a chance that this approach would be used in Meson.

While I'm not opposed to collaborations, there is effort on such a tool in the Clang world already, so let's collaborate there.

Thanks,

--Ben

mathstuf avatar Aug 25 '19 21:08 mathstuf

See clang-scan-deps (on the phone, can get a link tomorrow) for such a tool.

I wasn't aware that there is already a project. Less work for us then :)

Emulating the compilers is hard though, so I think that is probably best at first. Such a tool would be useful for faster scanning if it proves to be too slow.

My idea was specifically not to emulate the compiler, rather provide a tool that works for 90%-99% of use cases and print an error for the rest. Granted, restricting the user is not very user-friendly, but it would guarantee that the code works with every setup.

This is not even the surface of the corner cases :) . Please join #modules and #sg15_tooling on the cpplang slack to discuss with other stakeholders.

Sure, my point was (primarily) to give an error and abort the build if such an edge case is discovered. Then support for these edge cases might be added later.

I have already gotten verbal confirmation that the JSON format is ok with GCC (I have a patch), Clang developers, and MSVC. When I asked EDG developers, they said they have to implement whatever the other three do anyways. I don't think we'll have issues with compiler support (it's really easy anyways compared to having modules at all). Flag spelling may differ, but GCC and Clang will likely be the same.

That's why we're working with ISO where all the implementors are present :) .

If this format becomes the standard for dependency scanning and every compiler supports it, then this would be the best-case scenario, and my main issue for relying on the compiler is resolved :)

In this case, the only remaining issue would be speed. Scanning for dependencies should be nearly instant (I haven't tested any compiler implementation yet, so I don't know about the current performance). It would also be really useful if the compiler would support scanning multiple files at once. This could reduce the process spawning overhead, especially on Windows,

mensinda avatar Aug 25 '19 22:08 mensinda

Sorry, got busy and didn't circle back on this. Some links:

  • clang-scan-deps review: https://reviews.llvm.org/D53354
  • cfe-dev thread: https://lists.llvm.org/pipermail/cfe-dev/2019-August/063072.html

My idea was specifically not to emulate the compiler, rather provide a tool that works for 90%-99% of use cases and print an error for the rest. Granted, restricting the user is not very user-friendly, but it would guarantee that the code works with every setup.

Well, the fundamental problem seems to be something like this:

#if __has_feature(frobnitz)
import frobnitz;
#else
import fallback.frobnitz;
#endif

The preprocessor definitions are easy to get, but the feature set is not so easy. __has_attribute also is likely in the same bucket.

In this case, the only remaining issue would be speed. Scanning for dependencies should be nearly instant (I haven't tested any compiler implementation yet, so I don't know about the current performance). It would also be really useful if the compiler would support scanning multiple files at once. This could reduce the process spawning overhead, especially on Windows,

Our prior paper (which missed mailings, but is available here) showed how per-source, per-target, and whole-project scanning is possible and isomorphic in terms of build correctness (the difference is mainly in incremental build work reductions). I think clang-scan-deps is likely to support batch scanning, but maybe not right away.

mathstuf avatar Aug 30 '19 12:08 mathstuf

Well, the fundamental problem seems to be something like this:

#if __has_feature(frobnitz)
import frobnitz;
#else
import fallback.frobnitz;
#endif

Even clang-scan-deps would have problems with this. There is no way any tool could reliably detect this for all compilers and compiler flags. That's why I would propose to specifically disallow such constructs by producing a hard error in the scanning step (with some helpful message on how to work around it, if possible). I am the first one to admit that this isn't user-friendly, but these cases should be the exception and not the norm.

Also, one can work around this in meson (and CMake) by using configure_file() with data from the compiler methods to generate a config.h:

#pragma once
#define HAS_FEATURE_frobnitz 1

Then your original case can be rewritten as:

#include "config.h"
#if HAS_FEATURE_frobnitz
import frobnitz;
#else
import fallback.frobnitz;
#endif

I will repeat that I am fully aware that this adds additional burden on the user and that not all (presumably valid) use cases can be solved like this one. However, in my opinion, the benefits outweigh the costs here. Such a scanner would be limited by design but blazing fast and easy to maintain. Additionally, I would argue that defining all HAS_FEATURE_*, etc. in a separate config.h generated by the build system is preferable anyway because it makes the code easier to understand.

This scanner would ideally be fairly compact and have no external dependencies. This is because it is even lower on the software stack than build systems like meson and CMake (but not ninja). So only the C++ or python standard library would be allowed as well as a custom build.py to bootstrap the project.

That's also why I don't particularly like the idea of a clang-scan-deps, exactly because it depends on LLVM (LLVM is built with CMake, so there would also be a circular dependency 😃). Just installing the LLVM toolchain to have support for module scanning seems overkill for me. On Windows with VS and if you are only using GCC on Linux, you don't need and/or want LLVM. Especially if you want to keep a docker image small and would only need clang-scan-deps for module detection.

PS: How do you actually use clang-scan-deps? I managed to install it from llvm-git, but I can't figure out how to scan stuff.

mensinda avatar Aug 30 '19 19:08 mensinda

I will repeat that I am fully aware that this adds additional burden on the user and that not all (presumably valid) use cases can be solved like this one. However, in my opinion, the benefits outweigh the costs here. Such a scanner would be limited by design but blazing fast and easy to maintain. Additionally, I would argue that defining all HAS_FEATURE_*, etc. in a separate config.h generated by the build system is preferable anyway because it makes the code easier to understand.

The compiler is likely to also be useful as a scanner (GCC is at least). Just probably not as fast as one would like. An option to say "I don't do silly things" to allow using clang-scan-deps seems reasonable though. That also nicely removes the dep cycle you're worried about.

(LLVM is built with CMake, so there would also be a circular dependency)

Well, CMake is unlikely to use modules itself until…oh 2029 probably based on the speed of C++11 adoption of our target compilers, so that part is not a cycle :) .

PS: How do you actually use clang-scan-deps? I managed to install it from llvm-git, but I can't figure out how to scan stuff.

That, I'm not sure. You'll have to ask Michael Spencer (one of its developers).

mathstuf avatar Aug 30 '19 19:08 mathstuf

Our prior paper (which missed mailings, but is available here) showed how per-source, per-target, and whole-project scanning is possible and isomorphic in terms of build correctness (the difference is mainly in incremental build work reductions). I think clang-scan-deps is likely to support batch scanning, but maybe not right away.

clang-scan-deps is designed for batch processing. It doesn't even have an interface for scanning a single file (other than providing it a batch of a single file).

Bigcheese avatar Aug 30 '19 21:08 Bigcheese

PS: How do you actually use clang-scan-deps? I managed to install it from llvm-git, but I can't figure out how to scan stuff.

clang-scan-deps expects a compilation database. It needs a full command line to know how to preprocess each file.

$ clang-scan-deps --compilation-database=db.json

Upstream clang-scan-deps doesn't currently support reporting modules deps yet.

Bigcheese avatar Aug 30 '19 21:08 Bigcheese

@jpakkane Ninja v 1.10 seems to have initial support for modules...? https://github.com/ninja-build/ninja/pull/1521

germandiagogomez avatar Jan 28 '20 09:01 germandiagogomez

So, as of today, where do we stand with meson support for C++20 modules? I am willing to ditch CMake for Meson at this point if that support is there. And it is unclear where or how the Ninja support factors in with Meson.

CMake supports Fortran, but no indication as far as I know as to when they will port that functionality over to C++20.

flajann2 avatar May 03 '20 04:05 flajann2

The module support is not implemented yet. The compiler toolchains do not support it that well yet either (at least last I looked, maybe it has changed). Once Ninja and toolchains have the necessary bits (and they are sufficiently stable) we'll add module support.

jpakkane avatar May 03 '20 10:05 jpakkane

FWIW, ninja does have the necessary bits merged now (and released in 1.10). Compilers still need a spec to write against for dependency info (at least for CMake's approach), but that's mostly on me to work on for SG15. CMake would then need some CMake-side changes for usage requirements related to modules.

mathstuf avatar May 03 '20 11:05 mathstuf

This is an implementation of http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1184r1.pdf

Module mapper implementation relicensed and freely available:

 - http://lists.llvm.org/pipermail/cfe-dev/2020-May/065487.html

germandiagogomez avatar May 26 '20 10:05 germandiagogomez

libcody, build system/compiler communication has been added to GCC. Might be relevant to Meson?

https://github.com/urnathan/libcody

germandiagogomez avatar Jun 09 '20 10:06 germandiagogomez

If meson wants to run something during the build and monitor that communication, it can help. But AFAIK, ninja has no way of specifying such a tool to communicate with, so you're likely left with the compiler sending off a message to some blocking process which then starts/communicates with the necessary tool (and all the locking fun that sounds like it involves).

While that is a solution to building modules, I don't find it a particularly scalable one (you don't know what is needed until the compiler has started and now you're consuming memory, process slots, and doing IPC to figure out what work to do next). It's probably fine for small projects which are writing their makefiles by hand though, but once there's a complicated dependency graph, I only forsee contention.

mathstuf avatar Jun 09 '20 11:06 mathstuf

Some news on the MSVC side: https://devblogs.microsoft.com/cppblog/introducing-source-dependency-reporting-with-msvc-in-visual-studio-2019-version-16-7/

lb90 avatar Aug 15 '20 11:08 lb90

Some news on the MSVC side: https://devblogs.microsoft.com/cppblog/introducing-source-dependency-reporting-with-msvc-in-visual-studio-2019-version-16-7/

MSVC

I did a quick test with a file main.cpp which starts with import hello; and the command line cl -std:c++latest -sourceDependencies main.json main.cpp and it bails out with error main.cpp(1): error C2230: could not find module 'hello' . So, this new option does not meet the needs of a build system, which is to obtain the source dependencies. For that, the command should not fail and outputs that main.cpp depends on a module named hello. Also, if you add a -P option to only do the preprocessing, it doesn't fail, but the main.json file obtained does not contain anything in the ImportedModules section. I would have expected the name hello there.

VS/MSBuild might be using something else that they didn't talk about because the build output window does show some form of source dependency scanning is happening and it's able to build IFC in the correct order.

Clang

I also checked again today the clang-scan-dep tool in the llvm-project git master branch, and it doesn't seem to handle modules at all.

GCC

So, amongst the three major compilers, it seems to me the one that currently has the better chance to receive support from a build system is GCC with the git devel/c++-modules branch. This is the only system that's currently documented and works as advertised.

johan-boule avatar Nov 03 '20 00:11 johan-boule