opentelemetry-go-instrumentation
opentelemetry-go-instrumentation copied to clipboard
Custom Probes
There are many packages that are not instrumented by this project. Some open-source and others closed-source. There should be a way to configure a custom Probe definition that users can provide and would provide instrumentation for these unsupported packages.
That's a great idea. I think one of the steps in getting there is to refine our Probe interface, maybe worth opening another issue for that? I have some ideas for improving the existing interface.
Yeah I think a refined, exposed Probe interface inherently provides the mechanism for custom probes. But calling out that that is intentional is good to do
That's a great idea. I think one of the steps in getting there is to refine our Probe interface, maybe worth opening another issue for that? I have some ideas for improving the existing interface.
I'm happy to have this issue be the place we discuss the needed probe changes. :+1:
I'll try to list the main points I was thinking about:
Changes to the current functions
Load(*link.Executable, *process.TargetDetails, config.Sampler) error
Currently, this function has 2 main purposes: 1) loading the eBPF programs and maps into the kernel. 2) attaching the programs to the relevant symbols via uprobes.
- Splitting this function to 2 might be better:
LoadandAttach. This separation might help make the code clearer, better error handling, and some added functionality (loading some program once and attaching it to different places in runtime). - In the future this function will also be responsible for setting the initial configuration (for example sampling, attributes collection, size of perf buffer etc'). If we decide to split the function, this part is probably more fitting in the
Attachfunction. In addition, the general configuration struct will need to be decided instead of passing the sampling config alone. - Passing
contexthere.
Run(eventsChan chan<- *Event)
- The event type will probably change to something from
pdataas was suggested. - This function might not be relevant for all probes. I'll write more on that in the next section.
- Do we want to pass
contexthere? currently the cancelation of this function is done by closing the perf buffer which causes the reading loop to exit.
New functions suggestions
Status() Status
This will return the status of the probe (Running, closed, error) It will be useful for constructing a higher API which will report a general status of all the probes.
SupportedConfiguration() Config
Each probe can have a different set of configurations it supports (for example: collecting HTTP headers, including the url.path in span name, collecting db.query.text, sampling by kafka topic, ...). This function will alow each probe to declare which configuration it supports.
ApplyConfig(Config)
Changing the configuration of the probe, by applying the supplied one.
Different types of probes
Our current interface assumes each probe will report events. This is not necessarily the case. For example, in #954 we want to have a probe which will write some value to a boolean variable. This kind of probe acts as a Job - it has a dedicated task and it should not be running in the background. Maybe we want to have a base Probe interface, and 2 additional interfaces which will embed it: ReportingProbe and JobProbe?
@RonFed this is a great overview, thanks for writing that up
- +1 to the idea of splitting
LoadandAttach. I think having atomic operations is much clearer and allows for more extension points. We can always have aLoadAndAttachlater for syntactic sugar if it's that convenient for users - +1 to passing context in Run() rather than implicitly exiting when the perf buffer is closed. But, could this lead to the perf buffer never being cleaned up?
- What are you thinking for some use cases for the Status() function?
- What's the purpose of SupportedConfiguration()? Would I call that before instantiating a Probe, then cast it to the type of config for that Probe? Couldn't each Probe just export a Config struct that you can use directly?
Different types of probes makes sense too. Could you generalize it even further to something like SyncProbe (does an action) and AsyncProbe (listens for events)?
From the sig meeting, my thinking of basing Probe config off the k8s scheduler framework (specifically the Plugin API)
If we have 2 types of probe ProbeA and ProbeB we could (roughly) have the following api:
So our root API would provide (something like)
// go.otel.io/auto/api
// Base types for probe and config
type Config interface {
Package() string
}
type Probe interface {
ApplyConfig(Config)
}
And we also require a standard New() function that's used by our auto-inst SDK to create the generic probe interface and call functions like Load() and Attach()
Then a user (or us) could import it and implement their custom probe
// github.com/me/foo-auto
// ProbeA implementations
type ProbeA interface {
Probe
DoProbeAThing()
}
type ProbeAConfig struct {
CustomA string
CustomSetting int
}
func (a *ProbeAConfig) Package() string {
return a.CustomA
}
func New(otelauto.Config probeConfig) otelauto.Probe {
probeAconfig := probeConfig.(*ProbeAConfig)
doThing(probeAconfig.CustomSetting)
return ProbeA{...}
}
There's some examples of this in place in this scheduler repo. I've just always thought it was a cool approach and think it fits here too
@damemi I'm not sure I understood your suggestion. Assuming our current probe for simplicity, can you give an example of how configuration of HTTP or DB will look?
@RonFed I'll try to put together a more concrete prototype, I was looking into it and got kind of sidetracked just thinking about the overall structure.
On that note, I have some ideas of what the final layout could look like for the Probe api that might help as an end goal:
- Move
/internal/pkg/instrumentation/probeto a root/pkg/probe- ie
go.opentelemetry.io/auto/pkg/probe - It makes sense to keep the
ProbeAPI decoupled from the entireInstrumentationroot package so that custom probes can be defined in isolation and imported without depending on the full auto-instrumentation SDK. In other words, users (and us) should be able to write a Probe that is entirely its own module (similar to collector-contrib)
- ie
- As proof, move our default probes from
/internal/pkg/instrumentation/bpfto a root/probes- for example
go.opentelemetry.io/auto/probes/net/http - If we define a modular Probe api that's done right then our default probes should also be built with it. This detangles our codebase and serves as documentation for writing custom probes
- This also provides them externally as a "probe library". We don't even need to worry about an ownership model for custom probes if users can define them as their own modules in their own repos (unless we want to offer a "contrib build" which I don't think is on the table right now)
- for example
- Finally, wire probes through
NewInstrumentation()
Something like:
import (
"go.opentelemetry.io/auto"
"go.opentelemetry.io/auto/probes/net/http"
"github.com/damemi/customprobe"
)
func main() {
inst, err := auto.NewInstrumentation(ctx,
auto.WithProbes(
http.New(),
customprobe.New())
)
...
}
Is this kind of what people were thinking for a public Probe api?
@damemi Yes, that is what I was thinking about as well in terms of our end goal in this issue.
@RonFed sounds good, so to show what I'm saying I did an example with db/http here: https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/1123
Could you show what you mean for a SupportedConfig() method like you proposed? I might just be missing the use case for it, or maybe we're talking about the same thing
as part of this (and shifting Probes to NewInstrumentation()) we would also define a method signature for a probe's New() func that enforces passing probe.Config as an argument.
@damemi The configuration example looks great to me.
Is the package function in the config interface just an example? I agree with the need for having a config interface, just not sure about its functions.
My initial idea was to have a common struct to all the probes (including some generic maps) - in that scenario, the SupportedConfig() function will return the keys of the map - letting the user know which configurations are available.
However, the approach in your PR is better so that function is not necessary if we'll have all the configuration structs exported.
@RonFed yeah I just used Package() as an example of something that all Probes might need to expose to the Instrumentation machinery (in this case I used it just for logging).
Overall this is really pretty much exactly how the Collector does config for different components. It's kind of similar to the k8s scheduler like I mentioned.
But yes I think it's better to have the configuration exposed in a concrete type that's owned by the Probe itself, so that's the main idea with this approach
SIG meeting notes:
- We need to decouple the
structfield.IDin the Manifest.- This needs to look at offset generation
- Possibly move offsets into a per-probe generation which would require a big overhaul of the offset generator.
- The
Runmethod needs to be updated to transportptrace.Traces
An interesting thought: with custom probes, this opens up the possibility to start instrumenting binaries other than Go (i.e. Rust, C++). Especially if the offsets are self contained in the probe.
@damemi I think we can try and add the configuration layer similar to your example as a first step.
This will need to be integrated with the definitions at provider.go
WDYT?
The current Run and Close method is defined as follow:
type Probe interface {
// ...
Run(tracesChan chan<- ptrace.ScopeSpans)
Close() error
}
Issues
The channel passed to Run is closed by another process after it calls Close
When a process wants to close probes it first calls Close and then when that call has returned it assumes Run is complete so it can close chan. The telemetryChan is not guaranteed to be unused though. There may be incorrect implementations or just concurrency patterns that mean something is still trying to be send on the telemetryChan.
Sending on a closed channel will cause a panic. Therefore, this design should be changed.
Instead of having Run receive a write-only channel, have it return a read-only channel it is writing to in a spawned goroutine. When Close is called, and the probe is sure it is done writing to the returned channel, it can close that channel and ensure no panic is raised.
The channel currently only supports trace telemetry
If the Probe produces log, metric, or other telemetry in the future it will need a forward compatible way to send this data.
Instead of having Run use a channel defined on ptrace.ScopeSpans, introduce a new probe.Telemetry type:
// Telemetry is the data read by a probe.
type Telemetry struct {
ScopeSpans ptrace.ScopeSpans
}
This new type can have fields added for additional signals when they are supported.
Proposal
The Run method would become:
type Probe interface {
// ...
Run() <-chan Telemetry
Close() error
}
// Telemetry is the data read by a probe. type Telemetry struct { ScopeSpans ptrace.ScopeSpans }
Having this include an error may also help communicate errors from the probe. For example:
// Telemetry is the data read by a probe.
type Telemetry struct {
ScopeSpans ptrace.ScopeSpans
// Error is the error encountered when reading the telemetry data.
Error error
}
Proposal
The
Runmethod would become:type Probe interface { // ... Run() <-chan Telemetry Close() error }
Based on https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/1248, this proposal should be revised to:
type Probe interface {
// ...
Run(func(Telemetry))
Close() error
}
@MrAlias and I were talking about this last week, and I think the points we came up with for the design were (let me know if I missed anything):
- It should be rich enough to cover different Probes and bpf program types
- It should be extensible and generic enough to cover use cases we haven't thought of yet
- Probes are the first API we should work out in our broader library refactor, because they unlock design in other areas (like the Manager and OTel Pipeline)
Tyler suggested a factory pattern for this, and I agree that would probably check all the boxes. I can work on a POC for that this week. I'd like to implement this in a way that first wraps the current Probes so that we can transition one by one in smaller PRs.
The factory pattern will be good to start internally. To fully support this from external users we will need to support multi-target running (https://github.com/open-telemetry/opentelemetry-go-instrumentation/issues/197).
I think the Probe interface should have a way of stating which versions it supports - This will help with cases like https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/1302
I think the Probe interface should have a way of stating which versions it supports - This will help with cases like #1302
Is this something you see the Manifest returned from a probe handling?
Is this something you see the Manifest returned from a probe handling?
Yea, that makes sense. We can add library version support and Go version support as part of the Manifest. That way the manager can figure out what can be loaded.
Meeting notes:
- Deep dive into how Odigos is using the project
- Event loop
- Process detection module provides events
- Uses eBPF for process detection
- Configuration events
- Results
- Call the Go auto-instrumentation
- Start
- Ended
- Configured
- Also call other Odigos modules
- Call the Go auto-instrumentation
- The fact that daemons need global state means that we may not want to build them into the probe API. This feature/coordination will be in the CLI/Beyla donation/Odigos agent. The auto-instrumentation Go project will be scoped to running “probes” that instrument Go applications for a single process.
- Process detection should be moved out (only accept a PID and have process management outside of
Instrumentation) - Importing this into Beyla or Odigos has a binary-in-git issue. #1233 needs resolution.
- The fact that daemons need global state means that we may not want to build them into the probe API. This feature/coordination will be in the CLI/Beyla donation/Odigos agent. The auto-instrumentation Go project will be scoped to running “probes” that instrument Go applications for a single process.
- Process detection should be moved out (only accept a PID and have process management outside of Instrumentation)
+1 to these points, this benefits go-auto by scoping the work of this library to a reasonable amount, and supports users(vendors) to implement the coordination/detection/management in their own ways, which IMO is kind of the point of this project as a library. This gives us a clearer definition of what this repo is meant to do too