go proposal: cmd/go: compile-time instrumentation and `-toolexec`

Proposal Details

Following up on https://github.com/golang/go/issues/41145#issuecomment-2408949985, this issue aims to explain what we're doing at https://github.com/DataDog/orchestrion, why we're doing it this way, the challenges we face, and some of the toolchain evolution directions we'd like to explore.

Context

Users are embracing observability more and more, as the growing success of OpenTelemetry demonstrates; but onboarding Go applications to observability requires manually instrumenting the whole codebase, making it a tedious/expensive endeavor especially when onboarding existing large applications.

In go, everything is explicit. If some behavior happens, there is code describing it. No hidden magic, no hidden cost. This is a great property to have, but this means instrumentation can be distracting as it may take up a lot of space in the codebase (often multiple lines to start a span with several tags, and then to finish the span, possibly capturing error information & occasionally additional tags), which can result in obscuring the business logic. It also makes it easy for a developer to forget using instrumentation; resulting in observability blind spots (or in the case of security instrumentation, in leaving vulnerable code available to exploit).

Customers want to be able to onboard observability with as little friction as possible, and many prefer observability to stay out of sight when building application logic. The interest in approaches such as eBPF is a testament to the desire of having "no-code-change" observability; but these approaches are inherently limited to certain platforms, incur a performance toll that is not acceptable in certain situations, and limited (for good reasons!) in capabilities.

In addition to this, many enterprises haven't fully embraced the DevOps (and even fewer the DevSecOps) paradigm, and have different organizations take charge of developing, operating, and securing software products. There is often tension between the goals/KPIs of these different organizations (schematizing a little bit):

Developers are measured on how fast they deliver business logic
Operators are measured on how efficient and reliable the products operate
Security teams are measured on how secure the application is

In effect this means developers don't want to be encumbered by observability (or security) if they can avoid it; operators and security teams want to be able to deliver their goals without getting in the way of developers. These give an advantage to instrumentation techniques that do not require code modification; as they minimize impact on developer productivity while allowing to maximize coverage.

Other use-cases for compile-time source code re-writing have recently gotten some visibility:

https://github.com/pijng/prep (compile-time evaluated expression)
https://y0yang.com/posts/golang-opentelemetry-auto-instrumentation (OTel focused, but otherwise very similar to Orchestrion)

Orchestrion

We have built orchestrion as a tool targeted to Ops and SecOps personas, but that can also fit within a Dev/DevOps/DevSecOps workflow. It uses -toolexec to intercept all invocations to go tool compile, re-writing all the .go files to add instrumentation everywhere possible.

Orchestrion can in some ways be seen as a compile-time-woven Aspect-oriented Programming (AoP) framework; except it is currently fairly specialized to our instrumentation needs.

Challenges we had to address

Adding dependency edges

One of the consequences of re-writing source code to inject instrumentation calls is that we often need to introduce new dependencies to the package being built. A package that is built against github.com/gorilla/mux once instrumented as a new dependency edge to gopkg.in/DataDog/dd-trace-go/contrib/gorilla/mux to support the instrumentation. In most cases these new dependency edges support additional import clauses, which is ideal as it allows the compiler to type-check all parts of the re-written file. To support the compiler, Orchestrion modifies the importcfg file to register mappings between import paths and the corresponding export file, as resolved by go list (via packages.Load).

In some situations, Orchestrion needs to use //go:linkname (in the "handshake") form in order to avoid creating circular import dependencies. In those cases, the produced archive has an implied dependency on some other package; which must be made available at link-time. We embed a special text file within the produced archive to track those "link-time" dependencies, so that the requirement is persisted in GOCACHE. We then intercept invocations of go tool link and update the importcfg.link file, adding all link-time dependencies to it (again, resolved by go list via packages.Load).

Resolving additional dependencies

In order to correctly resolve the new dependency edges, we have to use packages.Load (go list), and forward the appropriate BuildFlags, such as -cover, -covermode and -coverpkg. Today, the toolchain does not provide any visibility on the full build arguments and Orchestrion has to crawl the process tree in search for a go build (or go test, go run, etc...) invocation, gather it's arguments list, and do its best at parsing it, so it can forward the right values to child compilations if needed.

This poses challenges because go's standard flags are fairly permissive in terms of style and syntax (e.g, it transparently allows adding an extra - ahead of a flag, which allows using the -flag or --flag styles interchangeably), and there is no way to know for sure what flags have a value or are simple boolean toggles.

We also need to re-implement some default argument behavior; specifically there is an implied -coverpkg argument value if one is not provided explicitly, that we need to make explicit when triggering child builds, as failure to do so typically leads to a fingerprint mismatch at link time.

In addition to this, our need to resolve additional packages during the build means we are going to trigger more compilation processes than desired (by the user); because we have no way to co-operate with the toolchain to honor the -p n flag provided by the user.

Since multiple packages may introduce the same synthetic dependency, we often end up needing to resolve the same package multiple times in parallel, which is a waste of resources. To address this, we had to implement a job server, which is a process that exists for the duration of the build, and which is responsible for resolving those dependencies, ensuring a given dependency is resolved exactly once. That process can also check that we are not introducing cyclic dependencies; and "cleanly" aborts the build if it detects a cycle, instead of endlessly recursing over itself.

Parsing tool arguments

Orchestrion also needs to parse arguments passed to compile and link commands, but there is no programmatic way to get these in a structured way. We have to resort to parsing the help output of these commands to infer what arguments have value (or don't) and hope for the best.

In the case of link there isn't much we care about except for accessing the -importcfg flag. For compile, we actually need to establish the correct list of positional arguments (they'll be the .go files) without risking a false positive (we had a bug where we simply took all arguments that ended in .go but ran into a fun issue with the github.com/nats-io/nats.go package).

Interactions with `GOCACHE`

Orchestrion tries to co-operate with the go toolchain as naturally as possible. One aspect of this is integrating with GOCACHE to allow our users to enjoy the performance benefits of incremental builds.

In order to do this, Orchestrion needs to influence the build ID of objects that are built with instrumentation. The only "hook point" available is intercepting the -V=full invocation of all tools, and modifying the string it returns to include an invalidation fragment.

Today, we intercept both the compile -V=full and link -V=full invocations, and append to it a hash that summarizes:

The version of orchestrion being used, as changing it may change how aspects are applied, which may produce different instrumentation
The complete list of aspects in use, as changing these will produce different instrumentation
The complete list of packages aspects might inject, together with their dependencies (another use of packages.Load), as a lot of these are typically not accounted for in the "natural" build ID of any object

The drawback of being able to do this only at the "complete build level" is that it'll result in excessively frequent cache invalidation... Of course, changing orchestrion itself should be treated equivalent to changing compilers/toolchain, and result in complete invalidation... But the aspects & injected dependencies will only affect packages where they are effectively used, which isn't all of them.

For example, we don't do anything in the majority of the standard library (some aspects today target os and runtime), so the "regular" export files would be fine to use for all the rest.

Line information preserving

We want the instrumented code to retain the original code's line information, so that stack traces visible by the user still match the code they wrote. To do this, we sprinkle //line directives around every "synthetic" code.

The go/ast package does not provide much in the way of a facility to manipulate comments or directives, and we use github.com/dave/dst to simplify editing commentary within the source files. This however comes at a significant cost (as it performs several other costly operations we don't necessarily need done).

The injected code can also easily interfere with the debugging experience (via dlv), as the "real" source file including that instrumented code is often no longer present on disk when the program is being run... This becomes more problematic in cases when the original source file is not in "canonical" go format, as the line numbering we infer from the original file may be skewed (because github.com/dave/dst always produces canonical go format output) -- a "source map" (like is done in Node.JS, for example) would probably be somewhat easier to confidently manage.

Toolchain (`-toolexec`) evolution directions

Some of these may be moonshots, or things that don't fit well with the go design principles; but I'm including them anyway as they might hint at problems in search of a solution, and maybe someone comes up with a more acceptable solution:

Dedicated support for source-rewriting tools
- Also, a simplified onboarding experience (pluggable transformers automatically discovered from go.mod dependencies?)
Ability to influence the build id on a per-package basis
Ability to add new edges to the build graph
Improved go/ast API, in particular w/r/t comment alterations
Improved source mapping (that'd make it easier to "stack" transforms, e.g cover + instrumentation)
Allow -toolexec to (try to) respect the -p n flag of the overarching build
Share overarching build flags with -toolexec, or a way for a -toolexec tool to request extra build artifacts from the overarching build (see also, adding new edges to the build graph)

Other related issues

https://github.com/golang/go/issues/41145
https://github.com/golang/go/issues/27628
https://github.com/golang/go/issues/29430
https://github.com/golang/go/issues/35204

Oct 15 '24 10:10 RomainMuller

CC @matloob @samthanawalla

Oct 15 '24 13:10 ianlancetaylor

related to https://github.com/golang/go/issues/68728

Nov 06 '24 08:11 y1yang0

Adding dedicated support for source rewriting tools would result in a significant increase in complexity in the go command, and also in the interface surface area of the go command that we wouldn't be able to easily change. I don't believe the costs are worth it for a feature like this.

I think a use case like this is exactly the kind of thing that would warrant using a more sophisticated build system like Bazel that has features like aspects that would make these sorts of changes to the build graph much easier.

Dec 17 '24 21:12 matloob

Thanks! This isn't so much a "proposal" as it is a "problem report", which is still very helpful!

Adding dependency edges

Out of curiosity, since Orchestrion wraps the go command (it's not purely a toolexec), did you consider first finding all source files using go list, then transforming them, then invoking go build with -overlay? You've certainly thought much more about this than I have, so I'd be curious what the drawbacks or limitations of that type of approach are.

I think (if this approach works), it would also interact with GOCACHE much better., and would avoid the problem with -p. However, the wrapper tool would probably want to do its own caching of the rewritten overlays; I'm not sure if it would be able to leverage the Go cache for that or not.

Given the sequencing of how cmd/go builds the action graph and resolves build IDs, it seems architecturally difficult to make changes to the graph or build IDs as it runs. That's what leads me to wonder if there's a different approach that does the source transformation before cmd/go starts building the action graph. Maybe that's not possible today, but it might be something that's easier to support.

Parsing tool arguments

This seems to come up for just about every non-trivial toolexec.

Just thinking out loud about ways to solve this without introducing a large amount of complexity... We could pass just a little "type information" along with the tool invocation command lines. For example, in a compile invocation, cmd/go could indicate which of the arguments are flags, and what components of the command line are input files or output files. This little bit of extra structure would be much easier for a tool to process, while still being trivial to flatting into an actual command line. I'm not sure if that solves enough of the problem to be worth it.

Line information preserving

This is definitely a known weakness. The standard go packages are a surprisingly poor fit for source rewriting, and are especially weak when dealing with comments. This is a problem the gopls team runs into, too. I think the "original" issue tracking this is #20744 (the https://github.com/dave/dst package was also brought up there).

Dec 24 '24 01:12 aclements

id you consider first finding all source files using go list, then transforming them, then invoking go build with -overlay

Insanely, I just found that I actually implemented this strategy with my project xgo, though previously it uses -toolexec instead of -overlay.

Last year I made a public share on building a mocking library on top of -toolexec: https://blog.xhd2015.xyz/posts/xgo-monkey-patching-in-go-using-toolexec/

And today I'm switching that mechanism to -overlay, which increases the ability and maintainability of the framework.

You can check it out: https://github.com/xhd2015/xgo/releases/tag/v1.1.0

Apr 15 '25 13:04 xhd2015

Hey @aclements let me answer some of your questions here...

Out of curiosity, since Orchestrion wraps the go command (it's not purely a toolexec), did you consider first finding all source files using go list, then transforming them, then invoking go build with -overlay? You've certainly thought much more about this than I have, so I'd be curious what the drawbacks or limitations of that type of approach are.

I contemplated this idea, but essentially it has some drawbacks (we can argue whether they're worse than what we do today or not... but that's a topic for another day):

Doing the go list pass upfront effectively is duplicated work, since the go builder will actually do something very similar; it can also be quite slow and it's not necessarily easy to ensure it does not also build those packages it lists (which would be even more wasted effort);
We don't always wrap go - we also support getting invoked as go build -toolexec='orchestrion toolexec' (or the same via GOFLAGS) and in this case we arrive too late to inject an -overlay - this does make inserting the tool in late stages of CI/CD pipelines a bit easier;
In order to drive the re-writing, we need to parse the code into an AST, then type-check that... Which means we'd effectively be re-implementing go build there, which isn't the goal.

However, the wrapper tool would probably want to do its own caching of the rewritten overlays; I'm not sure if it would be able to leverage the Go cache for that or not.

Yeah, I've toyed with other ideas that involved caching some artifacts separately from the go cache (one of these ideas involved GOCACHEPROG). I don't think it's safe (or necessary) to use to go cache for this, although if we could somehow use it safely, it'd make CI/CD set-ups that want to preserve cache between runs a lot easier (as folks would not need to do anything special if it already works for "vanilla" go).

Given the sequencing of how cmd/go builds the action graph and resolves build IDs, it seems architecturally difficult to make changes to the graph or build IDs as it runs

Yeah I would not expect this to be a trivial change here... Obviously, we're open to consider other ways to achieve our end goal that don't involve doing late-stage surgery on the build graph. That may involve the go toolchain providing other features/mechanisms to support removing duplicated effort from those alternate approaches (such as for example the -overlay that we discussed just before).

We could pass just a little "type information" along with the tool invocation command lines. For example, in a compile invocation, cmd/go could indicate which of the arguments are flags, and what components of the command line are input files or output files.

Also thinking out loud... The toolchain could dump a JSON object containing the "parsed" flags and expose an environment variable with the path to this object. That object would ideally contain only those flags that impact build (i.e - those that'll influence build IDs), and they'd be fully resolved (i.e -- if -cover is passed, -coverpkg would always be present with the actual value that would be used; file paths would be absolute, etc...). That would help with having to second guess stuff there...

I'd have said the standard library could expose types to help work with this but it would likely commingle the toolchain & standard library versions in undesirable ways.

This is definitely a known weakness. The standard go packages are a surprisingly poor fit for source rewriting, and are especially weak when dealing with comments. This is a problem the gopls team runs into, too. I think the "original" issue tracking this is https://github.com/golang/go/issues/20744 (the dave/dst package was also brought up there).

Yes I am aware of these. Beyond the comment management issue, there is also the fact that it's not necessarily trivial to re-write an AST into Go source code without incurring an undesirable formatting pass (i.e, a.go -> parse -> (no change) -> print -> a.go (identical to original)), which can also cause line changes.

Jul 25 '25 07:07 RomainMuller

On the side, I shall note that we are working on making an OpenTelemetry standard form of Orchestrion, as a dedicated SIG has been bootstrapped. The new tool is being (re)built from the ground up together with the other SIG members (notably - folks from Alibaba Cloud have been up to very similar things to Orchestrion, and we are combining).

This work is happening @ https://github.com/open-telemetry/opentelemetry-go-compile-instrumentation/ and I guess any support, insight, recommendation, feedback, etc... from Go maintainers is more than welcome there.

Jul 25 '25 07:07 RomainMuller

Some update regarding the use of -overlay... Compile-time instrumentation tooling is amenable to perform modifications in the entire dependency closure of the application, and if users don't run go mod vendor, that includes a lot of files under GOMODCACHE, and the documentation states (emphasis mine):

-overlay file

read a JSON config file that provides an overlay for build operations. The file is a JSON object with a single field, named Replace, that maps each disk file path (a string) to its backing file path, so that a build will run as if the disk file path exists with the contents given by the backing file paths, or as if the disk file path does not exist if its backing file path is empty. Support for the -overlay flag has some limitations: importantly, cgo files included from outside the include path must be in the same directory as the Go package they are included from, overlays will not appear when binaries and tests are run through go run and go test respectively, and files beneath GOMODCACHE may not be replaced.

This unfortunately makes -overlay unsuitable for this particular use-case, unless this artificial limitation was to be removed.

Aug 18 '25 13:08 RomainMuller

proposal: cmd/go: compile-time instrumentation and `-toolexec`

Proposal Details

Context

Orchestrion

Challenges we had to address

Adding dependency edges

Resolving additional dependencies

Parsing tool arguments

Interactions with GOCACHE

Line information preserving

Toolchain (-toolexec) evolution directions

Other related issues

Interactions with `GOCACHE`

Toolchain (`-toolexec`) evolution directions