opentelemetry-collector-contrib
opentelemetry-collector-contrib copied to clipboard
[receiver/prometheus] Add Target Info API
Description: Adds ability to provide confighttp.HTTPServerSettings to the prometheus receiver that will be used to expose a subset of the Prometheus API. At present this only includes the /targets resource that will return information about active and discovered scrape targets, including debugging information typically not available without verbose debug logging.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Closed as inactive. Feel free to reopen if this PR is still being worked on.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Closed as inactive. Feel free to reopen if this PR is still being worked on.
@dashpole would appreciate another look at this post our discussion on https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/29622.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Thanks @Aneurysm9 for re-opening this!
My suggestion would be to re-use the Prometheus API struct with agent mode set to true, so that we get the benefit of not needing to have as much duplicated code for the API internals from the Prometheus repo and this code drifting from the main Prometheus branch.
I agree we don't want to add any additional API paths that don't apply to the Prometheus Receiver. With the API from the Prometheus repo, all paths with wrap() will return data, whereas all paths with wrapAgent() will return "unavailable with Prometheus agent": https://github.com/prometheus/prometheus/blob/main/web/api/v1/api.go#L362-L407. This way only the paths /targets, /scrape_pools, /status/*, actually return data and do any calculations/lookups.
I tried this below as a rough POC and verified it works. This sets up the API in the same way as the Prometheus web package, which sets up the API and hosts it in addition to hosting the UI. We can do the same, but without adding in any of the UI-related code for serving the react app paths:
func (r *pReceiver) initPrometheusComponents(ctx context.Context, host component.Host, logger log.Logger) error {
// All existing code
...
...
// Create Options just for easy readability for creating the API object.
// These settings are more applicable for what we want to expose for configuration for the Prometheus Receiver.
o := &web.Options{
ScrapeManager: r.scrapeManager,
Context: ctx,
ListenAddress: ":9090",
ExternalURL: &url.URL{
Scheme: "http",
Host: "localhost:9090",
Path: "",
},
RoutePrefix: "/",
ReadTimeout: time.Minute * readTimeoutMinutes,
PageTitle: "Prometheus Receiver",
Version: &web.PrometheusVersion{
Version: version.Version,
Revision: version.Revision,
Branch: version.Branch,
BuildUser: version.BuildUser,
BuildDate: version.BuildDate,
GoVersion: version.GoVersion,
},
Flags: make(map[string]string),
MaxConnections: maxConnections,
IsAgent: true,
Gatherer: prometheus.DefaultGatherer,
}
// Creates the API object in the same way as the Prometheus web package: https://github.com/prometheus/prometheus/blob/6150e1ca0ede508e56414363cc9062ef522db518/web/web.go#L314-L354
// Anything not defined by the options above will be nil, such as o.QueryEngine, o.Storage, etc. IsAgent=true, so these being nil is expected by Prometheus.
factorySPr := func(_ context.Context) api_v1.ScrapePoolsRetriever { return r.scrapeManager }
factoryTr := func(_ context.Context) api_v1.TargetRetriever { return r.scrapeManager }
factoryAr := func(_ context.Context) api_v1.AlertmanagerRetriever { return nil }
FactoryRr := func(_ context.Context) api_v1.RulesRetriever { return nil }
var app storage.Appendable
logger = log.NewNopLogger()
apiV1 := api_v1.NewAPI(o.QueryEngine, o.Storage, app, o.ExemplarStorage, factorySPr, factoryTr, factoryAr,
func() config.Config {
return *r.cfg.PrometheusConfig
},
o.Flags,
api_v1.GlobalURLOptions{
ListenAddress: o.ListenAddress,
Host: o.ExternalURL.Host,
Scheme: o.ExternalURL.Scheme,
},
func(f http.HandlerFunc) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
f(w, r)
}
},
o.LocalStorage,
o.TSDBDir,
o.EnableAdminAPI,
logger,
FactoryRr,
o.RemoteReadSampleLimit,
o.RemoteReadConcurrencyLimit,
o.RemoteReadBytesInFrame,
o.IsAgent,
o.CORSOrigin,
func() (api_v1.RuntimeInfo, error) {
status := api_v1.RuntimeInfo{
GoroutineCount: runtime.NumGoroutine(),
GOMAXPROCS: runtime.GOMAXPROCS(0),
GOMEMLIMIT: debug.SetMemoryLimit(-1),
GOGC: os.Getenv("GOGC"),
GODEBUG: os.Getenv("GODEBUG"),
}
return status, nil
},
nil,
o.Gatherer,
o.Registerer,
nil,
o.EnableRemoteWriteReceiver,
o.EnableOTLPWriteReceiver,
)
// Create listener and monitor with conntrack in the same way as the Prometheus web package: https://github.com/prometheus/prometheus/blob/6150e1ca0ede508e56414363cc9062ef522db518/web/web.go#L564-L579
level.Info(logger).Log("msg", "Start listening for connections", "address", o.ListenAddress)
listener, err := net.Listen("tcp", o.ListenAddress)
if err != nil {
return err
}
listener = netutil.LimitListener(listener, o.MaxConnections)
listener = conntrack.NewListener(listener,
conntrack.TrackWithName("http"),
conntrack.TrackWithTracing())
// Run the API server in the same way as the Prometheus web package: https://github.com/prometheus/prometheus/blob/6150e1ca0ede508e56414363cc9062ef522db518/web/web.go#L582-L630
mux := http.NewServeMux()
router := route.New().WithInstrumentation(setPathWithPrefix(""))
mux.Handle("/", router)
// This is the path the web package uses, but the router above with no prefix can also be Registered by apiV1 instead.
apiPath := "/api"
if o.RoutePrefix != "/" {
apiPath = o.RoutePrefix + apiPath
level.Info(logger).Log("msg", "Router prefix", "prefix", o.RoutePrefix)
}
av1 := route.New().
WithInstrumentation(setPathWithPrefix(apiPath + "/v1"))
apiV1.Register(av1)
mux.Handle(apiPath+"/v1/", http.StripPrefix(apiPath+"/v1", av1))
errlog := stdlog.New(log.NewStdlibAdapter(level.Error(logger)), "", 0)
spanNameFormatter := otelhttp.WithSpanNameFormatter(func(_ string, r *http.Request) string {
return fmt.Sprintf("%s %s", r.Method, r.URL.Path)
})
httpSrv := &http.Server{
Handler: otelhttp.NewHandler(mux, "", spanNameFormatter),
ErrorLog: errlog,
ReadTimeout: o.ReadTimeout,
}
webconfig := ""
// An error channel will be needed for graceful shutdown in the Shutdown() method for the receiver
go func() {
toolkit_web.Serve(listener, httpSrv, &toolkit_web.FlagConfig{WebConfigFile: &webconfig}, logger)
}()
return nil
}
Hi @Aneurysm9 any update on this PR? I have confirmed it's possible to host the Prom UI separately on a different port through golang and re-route the API calls to the prom receiver's API port, so this PR will still work well with the out-of-the-box Prom react app
I would also prefer not to copy as much of the prometheus codebase if we can avoid it.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Hi @Aneurysm9 friendly ping for this PR. I am happy to help with any changes needed for this PR to go in
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
@gracewehner I think we're going to run into issues with Prometheus having duplicated some Collector code. I get the following error, even after removing any explicit reference to the Prometheus storage package:
=== FAIL: internal/api (0.00s)
panic: failed to register "pkg.translator.prometheus.PermissiveLabelSanitization": gate is already registered
goroutine 1 [running]:
go.opentelemetry.io/collector/featuregate.(*Registry).MustRegister(...)
/home/ec2-user/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/registry.go:114
github.com/prometheus/prometheus/storage/remote/otlptranslator/prometheus.init()
/home/ec2-user/go/pkg/mod/github.com/prometheus/[email protected]/storage/remote/otlptranslator/prometheus/normalize_label.go:15 +0x390
FAIL github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver/internal/api 0.060s
I suspect we're caught in a loop where Prometheus duplicates code from the Collector because we import code from them which causes problems for them updating the code, which prevents us from further importing code that references that duplicated code, causing us to duplicate code from them. Since there's just a single module on the Prometheus side we don't have an option to replace their implementation with our own.
Based on discussion at the WG last week I have created https://github.com/prometheus/prometheus/pull/13932 to remove the conflicting feature gate registration from the copied translation packages in prometheus/prometheus.
Thanks @Aneurysm9 for investigating, I was also seeing that issue. I had also been working on a full PR for the alternative API approach I had mentioned above and having it as an extension. I just made the PR here: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/32646. We can discuss the approaches in the meeting tomorrow
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Closed as inactive. Feel free to reopen if this PR is still being worked on.