pants icon indicating copy to clipboard operation
pants copied to clipboard

[internal] Adding first-party sources compilation, linking, packaging, and running to the CC backend

Open sureshjoshi opened this issue 2 years ago • 18 comments

This draft PR is not even remotely close to merge-ready. It's been created to solicit feedback on best practices for dependency inference and internal API usage from the Pants maintainers, so I don't go too far with this backend while being completely in a bubble.

~The desired scope of this PR is strictly compilation (without linking) on first-party code which occurs via the check goal. This code also allows discovering a Clang or GCC compiler already installed on the host system (no compilers are downloaded).~ Turns out I just kept typing past what the expected scope was intended to be.

This PR attempts to cover the main C/C++ first-party sources compilation and "build" workflow.

Specifically out-of-scope:

  • cc_test targets
  • cc_library as a dependency to any cc target
  • Third-party code of any shape

This code is being developed/tested on MacOS (x86_64) only, but will be tested on Linux when it's ready for a PR. Currently it's being tested using ./pants_from_sources fmt lint check examples/cc:: on the files here: https://github.com/sureshjoshi/pants-plugins/tree/76-cpp-compilation/examples/cc

[ci skip-rust] [ci skip-build-wheels]

sureshjoshi avatar Aug 05 '22 18:08 sureshjoshi

@tdyas We've discussed selecting the C or C++ compiler based on file extensions, but we would also need a workflow around compiling .c files using a C++ compiler (which I do in every non-embedded use case).

I see two options for that:

  1. Allow a field or something to specify compiler per (lib/binary/target)
  2. Split out CC into C and CPP targets

Thoughts? Am I missing an easier solution?

sureshjoshi avatar Aug 05 '22 20:08 sureshjoshi

@tdyas We've discussed selecting the C or C++ compiler based on file extensions, but we would also need a workflow around compiling .c files using a C++ compiler (which I do in every non-embedded use case).

I see two options for that:

1. Allow a field or something to specify compiler per (lib/binary/target)

2. Split out CC into C and CPP targets

Thoughts? Am I missing an easier solution?

Just have the user specify C++ tool names for the C-related tool names option?

Re (2), this would also imply creation of Objective-C and other targets per every language that can be handled by the GCC/LLVM toolchains.

tdyas avatar Aug 08 '22 16:08 tdyas

Just have the user specify C++ tool names for the C-related tool names option? Re (2), this would also imply creation of Objective-C and other targets per every language that can be handled by the GCC/LLVM toolchains.

Wouldn't we have that anyways?

swift_sources, obj_c_sources, rust_sources?

My thought was splitting c_sources and cpp_sources instead of just cc_sources. So, the default filename extensions for cpp_sources is a superset of c_sources + additional ones.

EDIT: In hindsight - I'm not sure if splitting c and cpp would help, since they're just generators which create a single file that has to be handled in any case. ... So confused... This is wrinkling my brain...

sureshjoshi avatar Aug 08 '22 16:08 sureshjoshi

Just have the user specify C++ tool names for the C-related tool names option? Re (2), this would also imply creation of Objective-C and other targets per every language that can be handled by the GCC/LLVM toolchains.

Wouldn't we have that anyways?

swift_sources, obj_c_sources, rust_sources?

My thought was splitting c_sources and cpp_sources instead of just cc_sources. So, the default filename extensions for cpp_sources is a superset of c_sources + additional ones.

At least for languages handled by essentially the same toolchain, e.g., C/C++/Obj-C, it seems redundant to me for directories with different source files to end up with multiple targets if they were going to end up in the same library. At least for Rust, that's a whole different toolchain from gccclang so it makes sense to me there to mark it differently. (Also, as an aside, Rust is also different enough to model the targets not as "rust_sources" as per my Rust proposal here.) And I agree, no need to solve now, can leave it for further design discussions.

EDIT: In hindsight - I'm not sure if splitting c and cpp would help, since they're just generators which create a single file that has to be handled in any case. ... So confused... This is wrinkling my brain...

Same here. I'm not sure of what the "right" decision here is. Ideally we'd have no explicit targets but that is not what Pants supports currently. The choice is somewhat an arbitrary judgment call with no clear "right" answer.

tdyas avatar Aug 10 '22 22:08 tdyas

In thinking about the compiler vs extension problem, while thinking about it further and reviewing my repos - I think it's perfectly reasonable to default to the expected compilers, but provide an escape hatch via an optional target flag, or maybe even a subsystem option.

Maybe via a language field in the sources target - and then we could use the alternate compiler for that.

I thought we might run into some linkage problems down the road, but no, so long as headers are correctly externd, then C++ won't mangle names, and C/C++ compiled objects are safe to use in whatever permutations we want.

sureshjoshi avatar Aug 11 '22 02:08 sureshjoshi

Re: cpp_sources vs c_sources, I think this depends on how much the different tools are going to work with both C and C++. For example, test runners, formatters, linters. If some only work on C and choke on C++ syntax, then it may be desirable to have a c_source target so that we can programmatically know via FieldSets whether the tool can work on that target. Alternatively, we have to rely on file extensions -- but maybe that's fine for all the tools other than compilers?

If you stick with cc_sources, then I think a field for the compiler makes sense. Probably default it to be based on file extension, with a magic string value like <auto>. Otherwise, users can override it on a per-target basis. Reminder that we now have __defaults__ which makes it a lot more ergonomic to change the value for many targets at once.

Eric-Arellano avatar Aug 22 '22 22:08 Eric-Arellano

Re: cpp_sources vs c_sources, I think this depends on how much the different tools are going to work with both C and C++. For example, test runners, formatters, linters. If some only work on C and choke on C++ syntax, then it may be desirable to have a c_source target so that we can programmatically know via FieldSets whether the tool can work on that target. Alternatively, we have to rely on file extensions -- but maybe that's fine for all the tools other than compilers?

"Generally" C++ stuff works on C, but not vice-versa. Most of the tools I use work on both, and I use them in that way. At the moment I'm thinking keeping cc_sources until the day we run into a non-theoretical problem (like, while still in development, before release) seems like the best approach. I'm running into potential parallel problems which could inform this discussion.

If you stick with cc_sources, then I think a field for the compiler makes sense. Probably default it to be based on file extension, with a magic string value like . Otherwise, users can override it on a per-target basis. Reminder that we now have defaults which makes it a lot more ergonomic to change the value for many targets at once.

I think I need to look into __defaults__ more, because I don't understand it very well. One issue is that we'd have this compiler field potentially proliferate to more than just sources, which is a "problem" I'm running into today. I'm trying to keep track of what was used to compile each file so that when I put them together, I can use C/C++ appropriately.

sureshjoshi avatar Aug 25 '22 16:08 sureshjoshi

One issue is that we'd have this compiler field potentially proliferate to more than just sources, which is a "problem" I'm running into today.

I don't think I'm following that. What do you mean "more than just sources"?

I'm trying to keep track of what was used to compile each file so that when I put them together, I can use C/C++ appropriately.

Can you safely feed the output of one compiler into another compiler? If not, do you need to then make sure the entire transitive closure is set to the same compiler? Like how JVM and Python "resolves" work.

Eric-Arellano avatar Aug 25 '22 18:08 Eric-Arellano

I don't think I'm following that. What do you mean "more than just sources"?

I'm not sure either. I'm trying to think if the libraries or binaries would ever select C++ in spite of how the sources are compiled... I can't think of a reason to do that explicitly, without also compiling the sources as C++.

Can you safely feed the output of one compiler into another compiler? If not, do you need to then make sure the entire transitive closure is set to the same compiler? Like how JVM and Python "resolves" work.

So, this case was moreso because if we have all C sources, use the C linker (more accurately, the C frontend to the ld linker, but whatever - maybe someday we'll add support for different linkers, so we can get that parallel linking action).

If we have all C++ sources, use the C++ linker frontend. And if we have mixed sources, use C++.

So, it's just a matter of noting whether any of the object files have been compiled via C++ and then forcing that linker frontend.

I mean, in thinking about it, I suppose I could do that by file extensions too, unless someone has only C files, compiles with C++ - rendering file extensions a useless metric for linking.

sureshjoshi avatar Aug 25 '22 19:08 sureshjoshi

If we have all C++ sources, use the C++ linker frontend. And if we have mixed sources, use C++.

So, it's just a matter of noting whether any of the object files have been compiled via C++ and then forcing that linker frontend.

I mean, in thinking about it, I suppose I could do that by file extensions too, unless someone has only C files, compiles with C++ - rendering file extensions a useless metric for linking.

If you add a compiler field, then you would inspect the transitive closure for the compiler field to see if C++ was ever used. If so, use C++ linker.

And a reminder that for the compiler field, I'd recommend something like an auto setting that is based on file extension. Then users can override it if they want.

Eric-Arellano avatar Aug 25 '22 20:08 Eric-Arellano

If you add a compiler field, then you would inspect the transitive closure for the compiler field to see if C++ was ever used. If so, use C++ linker.

And a reminder that for the compiler field, I'd recommend something like an auto setting that is based on file extension. Then users can override it if they want.

Not trying to be dense here, but if this was set to "auto" for all sources, then how would we infer/determine if C++ was used from this field?

sureshjoshi avatar Aug 25 '22 20:08 sureshjoshi

Not trying to be dense here, but if this was set to "auto" for all sources, then how would we infer/determine if C++ was used from this field?

"auto" means "base it off of file extension". But users can override it to the compiler they want. That is all abstracted away from the plugin code, which will do something like CompilerField.normalized_value(CCSourcesField) to get back the compiler value. Do that for the whole transitive closure. If you encounter C++ anywhere, then use that for the linker. Else C. Does that make sense?

Eric-Arellano avatar Aug 25 '22 20:08 Eric-Arellano

I guess I'm having a hard time envisioning where CompilerField.normalized_value(CCSourcesField) would get changed in the compilation pipeline.

sureshjoshi avatar Aug 25 '22 21:08 sureshjoshi

I guess I'm having a hard time envisioning where CompilerField.normalized_value(CCSourcesField) would get changed in the compilation pipeline.

When compiling the file, you will consult the field to know which compiler to use. And then after compilation, you consult the transitive closure to determine which linker to use.

This is all meant to be a solution to your question:

We've discussed selecting the C or C++ compiler based on file extensions, but we would also need a workflow around compiling .c files using a C++ compiler (which I do in every non-embedded use case).

The easier thing would be to always use the same compiler for C vs. C++ and don't allow customizing it.

Eric-Arellano avatar Aug 25 '22 21:08 Eric-Arellano

Talked to Eric in Slack to clarify - makes sense now, will reflect in codebase. Idea is that cc_source's compiler field is where the compilation language determination happens, either added in by the user, or filename-based ("").

Then, this field is used to determine what toolchain to pick up and actually perform compilation/linking with later on.

sureshjoshi avatar Aug 25 '22 21:08 sureshjoshi

@tdyas @Eric-Arellano @stuhood I'm keeping this in draft while I go through and clear out TODOs, add more test coverage, and just apply some general cleanup and tweaks - which is when the PR will be ready for proper review.

However, with @tdyas's cgo PR in the mix, it seems like a good time to bikeshed on target and subsystem naming, determining how consistent we want to keep names with other tools, and within Pants itself.

References:

  • Bazel's CC rules: https://bazel.build/reference/be/c-cpp#cc_library
  • Please's CC rules: https://github.com/please-build/cc-rules
  • Makefile naming: https://www.gnu.org/software/make/manual/html_node/Implicit-Variables.html and https://www.gnu.org/software/make/manual/html_node/Catalogue-of-Rules.html#Catalogue-of-Rules
  • CMake: https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html#variable:CMAKE_%3CLANG%3E_FLAGS

Note: See https://github.com/pantsbuild/pants/discussions/16777 for a bikeshed discussion.

sureshjoshi avatar Sep 05 '22 02:09 sureshjoshi

However, with @tdyas's cgo PR in the mix, it seems like a good time to bikeshed on target and subsystem naming, determining how consistent we want to keep names with other tools, and within Pants itself.

We should probably open a separate issue for the bikeshedding on naming. Easier to keep that conversation distinct from the two PRs.

tdyas avatar Sep 05 '22 02:09 tdyas

However, with @tdyas's cgo PR in the mix, it seems like a good time to bikeshed on target and subsystem naming, determining how consistent we want to keep names with other tools, and within Pants itself.

We should probably open a separate issue for the bikeshedding on naming. Easier to keep that conversation distinct from the two PRs.

orrr..... A discussion!!! https://github.com/pantsbuild/pants/discussions/16777

sureshjoshi avatar Sep 05 '22 16:09 sureshjoshi

Note to reviewers: I'm going to close this one and open up a few smaller PRs, as this is getting unfairly large - especially with some modifications I haven't yet pushed.

sureshjoshi avatar Dec 06 '22 05:12 sureshjoshi