opentelemetry-collector-contrib icon indicating copy to clipboard operation
opentelemetry-collector-contrib copied to clipboard

[receiver/prometheus] Add Target Info API

Open Aneurysm9 opened this issue 2 years ago • 20 comments

Description: Adds ability to provide confighttp.HTTPServerSettings to the prometheus receiver that will be used to expose a subset of the Prometheus API. At present this only includes the /targets resource that will return information about active and discovered scrape targets, including debugging information typically not available without verbose debug logging.

Aneurysm9 avatar Jun 09 '23 00:06 Aneurysm9

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jul 06 '23 05:07 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jul 22 '23 05:07 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Aug 29 '23 05:08 github-actions[bot]

Closed as inactive. Feel free to reopen if this PR is still being worked on.

github-actions[bot] avatar Sep 13 '23 05:09 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Sep 29 '23 05:09 github-actions[bot]

Closed as inactive. Feel free to reopen if this PR is still being worked on.

github-actions[bot] avatar Oct 13 '23 05:10 github-actions[bot]

@dashpole would appreciate another look at this post our discussion on https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/29622.

Aneurysm9 avatar Dec 20 '23 00:12 Aneurysm9

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jan 03 '24 05:01 github-actions[bot]

Thanks @Aneurysm9 for re-opening this!

My suggestion would be to re-use the Prometheus API struct with agent mode set to true, so that we get the benefit of not needing to have as much duplicated code for the API internals from the Prometheus repo and this code drifting from the main Prometheus branch.

I agree we don't want to add any additional API paths that don't apply to the Prometheus Receiver. With the API from the Prometheus repo, all paths with wrap() will return data, whereas all paths with wrapAgent() will return "unavailable with Prometheus agent": https://github.com/prometheus/prometheus/blob/main/web/api/v1/api.go#L362-L407. This way only the paths /targets, /scrape_pools, /status/*, actually return data and do any calculations/lookups.

I tried this below as a rough POC and verified it works. This sets up the API in the same way as the Prometheus web package, which sets up the API and hosts it in addition to hosting the UI. We can do the same, but without adding in any of the UI-related code for serving the react app paths:

   func (r *pReceiver) initPrometheusComponents(ctx context.Context, host component.Host, logger log.Logger) error {

        // All existing code
        ...
        ...

       // Create Options just for easy readability for creating the API object.
       // These settings are more applicable for what we want to expose for configuration for the Prometheus Receiver.
	o := &web.Options{
		ScrapeManager: r.scrapeManager,
		Context:       ctx,
		ListenAddress: ":9090",
		ExternalURL: &url.URL{
			Scheme: "http",
			Host:   "localhost:9090",
			Path:   "",
		},
		RoutePrefix: "/",
		ReadTimeout: time.Minute * readTimeoutMinutes,
		PageTitle:   "Prometheus Receiver",
		Version: &web.PrometheusVersion{
			Version:   version.Version,
			Revision:  version.Revision,
			Branch:    version.Branch,
			BuildUser: version.BuildUser,
			BuildDate: version.BuildDate,
			GoVersion: version.GoVersion,
		},
		Flags:          make(map[string]string),
		MaxConnections: maxConnections,
		IsAgent:        true,
		Gatherer:       prometheus.DefaultGatherer,
	}

       // Creates the API object in the same way as the Prometheus web package: https://github.com/prometheus/prometheus/blob/6150e1ca0ede508e56414363cc9062ef522db518/web/web.go#L314-L354
       // Anything not defined by the options above will be nil, such as o.QueryEngine, o.Storage, etc. IsAgent=true, so these being nil is expected by Prometheus.
	factorySPr := func(_ context.Context) api_v1.ScrapePoolsRetriever { return r.scrapeManager }
	factoryTr := func(_ context.Context) api_v1.TargetRetriever { return r.scrapeManager }
	factoryAr := func(_ context.Context) api_v1.AlertmanagerRetriever { return nil }
	FactoryRr := func(_ context.Context) api_v1.RulesRetriever { return nil }
	var app storage.Appendable
	logger = log.NewNopLogger()

	apiV1 := api_v1.NewAPI(o.QueryEngine, o.Storage, app, o.ExemplarStorage, factorySPr, factoryTr, factoryAr,
		func() config.Config {
			return *r.cfg.PrometheusConfig
		},
		o.Flags,
		api_v1.GlobalURLOptions{
			ListenAddress: o.ListenAddress,
			Host:          o.ExternalURL.Host,
			Scheme:        o.ExternalURL.Scheme,
		},
		func(f http.HandlerFunc) http.HandlerFunc {
			return func(w http.ResponseWriter, r *http.Request) {
				f(w, r)
			}
		},
		o.LocalStorage,
		o.TSDBDir,
		o.EnableAdminAPI,
		logger,
		FactoryRr,
		o.RemoteReadSampleLimit,
		o.RemoteReadConcurrencyLimit,
		o.RemoteReadBytesInFrame,
		o.IsAgent,
		o.CORSOrigin,
		func() (api_v1.RuntimeInfo, error) {
			status := api_v1.RuntimeInfo{
				GoroutineCount: runtime.NumGoroutine(),
				GOMAXPROCS:     runtime.GOMAXPROCS(0),
				GOMEMLIMIT:     debug.SetMemoryLimit(-1),
				GOGC:           os.Getenv("GOGC"),
				GODEBUG:        os.Getenv("GODEBUG"),
			}
		
			return status, nil
		},
		nil,
		o.Gatherer,
		o.Registerer,
		nil,
		o.EnableRemoteWriteReceiver,
		o.EnableOTLPWriteReceiver,
	)

	// Create listener and monitor with conntrack in the same way as the Prometheus web package: https://github.com/prometheus/prometheus/blob/6150e1ca0ede508e56414363cc9062ef522db518/web/web.go#L564-L579
	level.Info(logger).Log("msg", "Start listening for connections", "address", o.ListenAddress)
	listener, err := net.Listen("tcp", o.ListenAddress)
	if err != nil {
		return err
	}
	listener = netutil.LimitListener(listener, o.MaxConnections)
	listener = conntrack.NewListener(listener,
		conntrack.TrackWithName("http"),
		conntrack.TrackWithTracing())

        // Run the API server in the same way as the Prometheus web package: https://github.com/prometheus/prometheus/blob/6150e1ca0ede508e56414363cc9062ef522db518/web/web.go#L582-L630
	mux := http.NewServeMux()
	router := route.New().WithInstrumentation(setPathWithPrefix(""))
	mux.Handle("/", router)

        // This is the path the web package uses, but the router above with no prefix can also be Registered by apiV1 instead.
	apiPath := "/api"
	if o.RoutePrefix != "/" {
		apiPath = o.RoutePrefix + apiPath
		level.Info(logger).Log("msg", "Router prefix", "prefix", o.RoutePrefix)
	}
	av1 := route.New().
		WithInstrumentation(setPathWithPrefix(apiPath + "/v1"))
	apiV1.Register(av1)
	mux.Handle(apiPath+"/v1/", http.StripPrefix(apiPath+"/v1", av1))

	errlog := stdlog.New(log.NewStdlibAdapter(level.Error(logger)), "", 0)
	spanNameFormatter := otelhttp.WithSpanNameFormatter(func(_ string, r *http.Request) string {
		return fmt.Sprintf("%s %s", r.Method, r.URL.Path)
	})
	httpSrv := &http.Server{
		Handler:     otelhttp.NewHandler(mux, "", spanNameFormatter),
		ErrorLog:    errlog,
		ReadTimeout: o.ReadTimeout,
	}
	webconfig := ""

        // An error channel will be needed for graceful shutdown in the Shutdown() method for the receiver
	go func() {
		toolkit_web.Serve(listener, httpSrv, &toolkit_web.FlagConfig{WebConfigFile: &webconfig}, logger)
	}()
  
       return nil
}

gracewehner avatar Jan 11 '24 22:01 gracewehner

Hi @Aneurysm9 any update on this PR? I have confirmed it's possible to host the Prom UI separately on a different port through golang and re-route the API calls to the prom receiver's API port, so this PR will still work well with the out-of-the-box Prom react app

gracewehner avatar Jan 22 '24 18:01 gracewehner

I would also prefer not to copy as much of the prometheus codebase if we can avoid it.

dashpole avatar Jan 23 '24 17:01 dashpole

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Feb 07 '24 05:02 github-actions[bot]

Hi @Aneurysm9 friendly ping for this PR. I am happy to help with any changes needed for this PR to go in

gracewehner avatar Feb 16 '24 18:02 gracewehner

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Mar 02 '24 05:03 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Mar 19 '24 05:03 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Apr 03 '24 05:04 github-actions[bot]

@gracewehner I think we're going to run into issues with Prometheus having duplicated some Collector code. I get the following error, even after removing any explicit reference to the Prometheus storage package:

=== FAIL: internal/api  (0.00s)
panic: failed to register "pkg.translator.prometheus.PermissiveLabelSanitization": gate is already registered

goroutine 1 [running]:
go.opentelemetry.io/collector/featuregate.(*Registry).MustRegister(...)
        /home/ec2-user/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/registry.go:114
github.com/prometheus/prometheus/storage/remote/otlptranslator/prometheus.init()
        /home/ec2-user/go/pkg/mod/github.com/prometheus/[email protected]/storage/remote/otlptranslator/prometheus/normalize_label.go:15 +0x390
FAIL    github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver/internal/api      0.060s

I suspect we're caught in a loop where Prometheus duplicates code from the Collector because we import code from them which causes problems for them updating the code, which prevents us from further importing code that references that duplicated code, causing us to duplicate code from them. Since there's just a single module on the Prometheus side we don't have an option to replace their implementation with our own.

Aneurysm9 avatar Apr 08 '24 17:04 Aneurysm9

Based on discussion at the WG last week I have created https://github.com/prometheus/prometheus/pull/13932 to remove the conflicting feature gate registration from the copied translation packages in prometheus/prometheus.

Aneurysm9 avatar Apr 15 '24 15:04 Aneurysm9

Thanks @Aneurysm9 for investigating, I was also seeing that issue. I had also been working on a full PR for the alternative API approach I had mentioned above and having it as an extension. I just made the PR here: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/32646. We can discuss the approaches in the meeting tomorrow

gracewehner avatar Apr 23 '24 17:04 gracewehner

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar May 08 '24 05:05 github-actions[bot]

Closed as inactive. Feel free to reopen if this PR is still being worked on.

github-actions[bot] avatar May 22 '24 05:05 github-actions[bot]