postgrest feat: produce OpenTelemetry traces with `hs-opentelemetry`

This PR introduces producing OpenTelemetry traces containing, among others, metrics same as in ServerTiming header from before.

TODO:

[x] document oTel configuration
[x] make code location attributes show call sites instead of helpers
[x] ~~build with Nix as well (for now Stack only)~~ DONE for free
[x] ~make an example of exporting log messages~ hs-opentelemetry doesn't support logging, per https://github.com/iand675/hs-opentelemetry/issues/100
[x] ~~make getTracer available globally: we're interested in using as many different spans as it makes sense, so getTracer should be available everywhere, as described in hs-opentelemetry-sdk's README~~ seems like impossible without major refactoring
[x] make use of hs-opentelemetry-wai middleware
[x] ~~look into failing Windows builds~~ hs-opentelemetry-sdk depends on unix, tracking in https://github.com/iand675/hs-opentelemetry/issues/109
[x] fix "memory test" failures (https://github.com/PostgREST/postgrest/actions/runs/7888534966/job/21526302784?pr=3140)
[x] fix dependency issues on 9.8 (https://github.com/PostgREST/postgrest/actions/runs/7888534966/job/21526300453?pr=3140)

Running:

I sort of gave up deploying and configuring all the moving bits locally, so you'd need to create the honeycomb.io account for this one (or ask me for the invite). After that, it's quite straightforward:

Build PostgREST executable with stack build, and get its path with stack exec -- which postgrest
Get a PostgreSQL server running (e.g. run nix-shell, then postgrest-with-postgresql-15 --fixture ./test/load/fixture.sql -- cat). Note the server URL, you'll need it when running PostgREST server
get a JWT token with default secret by running postgrest-jwt --exp 36000 postgrest_test_anonymous

Run PostgREST server with

OTEL_EXPORTER_OTLP_ENDPOINT='https://api.honeycomb.io/' \
OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=<honeycomb_api_key>"  \
OTEL_SERVICE_NAME='PostgREST' OTEL_LOG_LEVEL='debug' OTEL_TRACES_SAMPLER='always_on' \
PGRST_DB_URI='<postgresql_server_url>'   \
postgrest-run

request some data using the JWT token from above and check the honeycomb dashboard for the traces:

Tests

hspec tests are also instrumented, for those to produce traces you need to set OTEL_* vars only:

OTEL_EXPORTER_OTLP_ENDPOINT='https://api.honeycomb.io/' \
OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=<honeycomb_api_key>"  \
OTEL_SERVICE_NAME='PostgREST' OTEL_LOG_LEVEL='debug' OTEL_TRACES_SAMPLER='always_on' \
postgrest-test-spec

Jan 04 '24 17:01 develop7

Awesome work! :fire: :fire:

I sort of gave up deploying and configuring all the moving bits locally, so you'd need to create the honeycomb.io account for this one

Found this Nix flake that contains an OTel GUI: https://flakestry.dev/flake/github/FriendsOfOpenTelemetry/opentelemetry-nix/1.0.1

I'll try to integrate that once the PR is ready for review.

Jan 04 '24 19:01 steve-chavez

The recent problem I'm seemingly stuck with is hs-opentelemetry is using UnliftIO, which seems not quite composable with our (implicit, correct?) monad stack. So the deeper into the call stack the instrumented code is (the one I'm trying to wrap with inSpan), the more ridiculously complex it should be changed to be instrumented, i.e. https://github.com/PostgREST/postgrest/pull/3140/files#diff-5de3ff2b2d013b33dccece6ead9aeb61feffeb0fbd6e38779750511394cf9701R156-R157, up to the point I have no idea how to proceed further (e.g. wrapping App.handleRequests cases with their own spans, which is semantically correct)

There's a more straightforward MonadIO-involving opentelemetry library, with less activity and quite different approach to the telemetry data export (GHC eventlog → file/pipe by the GHC runtime). It looks less invasive approach, refactoring-wise, but requires more hoops to jump to actually deliver traces to Honeycomb/Lightstep/whatnot (pull eventlog → convert it to zipkin/jaeger/b3 → upload somewhere for analysis).

It also seems to boil down to the conceptual choice between online and offline traces' delivery-wise, or push and pull model.

@steve-chavez @wolfgangwalther @laurenceisla what do you think guys?

Jan 29 '24 17:01 develop7

@develop7 Would vault help? It was introduced on https://github.com/PostgREST/postgrest/pull/1988, I recall it helped with IORef handling.

It's still used on https://github.com/PostgREST/postgrest/blob/d2fb67f596a8ded54911662aab445b4d7aaae8bd/src/PostgREST/Auth.hs#L160-L165

I'm still not that familiar with OTel but the basic idea I had was to store these traces on AppState and export them async.

Jan 29 '24 23:01 steve-chavez

@develop7 Recently merged https://github.com/PostgREST/postgrest/pull/3213, which logs schema cache stats to stderr. Perhaps that can be used for introductory OTel integration instead? It might be easier since the scache stats are already in IO space.

Feb 12 '24 15:02 steve-chavez

Would vault help?

hs-opentelemetry is using it already

basic idea I had was to store these traces on AppState and export them async

Not only that, you want traces in tests too, for one.

The good news is hs-opentelemetry-utils-exceptions seems to be just what we need, let me try it.

Perhaps that can be used for introductory OTel integration instead?

Good call @steve-chavez, thank you for the suggestion. Will try too.

Feb 13 '24 14:02 develop7

it works!

Feb 13 '24 17:02 develop7

Since now we have an observer function and Observation module

https://github.com/PostgREST/postgrest/blob/229bc778864911e2fbcb37079323bf9cf8a3c3ca/src/PostgREST/App.hs#L170-L172

https://github.com/PostgREST/postgrest/blob/229bc778864911e2fbcb37079323bf9cf8a3c3ca/src/PostgREST/Observation.hs#L15-L18

Perhaps we can add some observations for the timings?

Also the Logger is now used like:

https://github.com/PostgREST/postgrest/blob/7c6c056e92077a666509195406b474c8a613605d/src/PostgREST/Logger.hs#L53-L54

https://github.com/PostgREST/postgrest/blob/7c6c056e92077a666509195406b474c8a613605d/src/PostgREST/CLI.hs#L50

For OTel, maybe the following would make sense:

otelState <- Otel.init

App.run appState (Logger.logObservation loggerState >> OTel.tracer otelState))

Feb 21 '24 00:02 steve-chavez

Perhaps we can add some observations for the timings?

Agreed, server timings definitely belong there.

Feb 23 '24 15:02 develop7

Okay, the PR is in the cooking for long enough, let's pull the plug and start small. Let's have it reviewed while I'm fixing the remaining CI failures.

Mar 11 '24 15:03 develop7

hs-opentelemetry is, according to the repo, in alpha state. According to the TODO list above, the issue tracker and the repo description, it does not support:

GHC 9.8.x
Windows
Metrics or Logging

I don't think we depend on this in the current state. And we should certainly not depend on an even-less-maintained fork of the same.

So to go forward here, there needs to be some effort put into the upstream package first, to make it usable for us.

Mar 11 '24 16:03 wolfgangwalther

A status update:

GHC 9.8: hs-opentelemetry-sdk doesn't build against 9.8 because of hs-opentelemetry-exporter-otlp → proto-lens chain. Given the upstream of the latter being bit unresponsive for the suggestions to bump upper bounds, I've managed to make the latter build for 9.8 in https://github.com/develop7/proto-lens/commit/985290f0f86cfcf775994b9a9d40156c08669083, but haven't figured out how to pick it up to the project since it depends on the google's protobuf compiler installed and the protobuf's source checked out. Another approach is to not use hs-o-sdk and hs-o-e-otlp altogether, which I probably should've tried way before.

Mar 29 '24 17:03 develop7

GHC 9.8: hs-opentelemetry-sdk doesn't build against 9.8 because of hs-opentelemetry-exporter-otlp → proto-lens chain. Given the upstream of the latter being bit unresponsive for the suggestions to bump upper bounds, I've managed to make the latter build for 9.8 in develop7/proto-lens@985290f,

Hm. I looked at your fork. It depends on support for GHC 9.8 in ghc-source-gen. This repo has a PR, which just was updated 3 days ago. I wouldn't call that "unresponsive", yet. Once ghc-source-gen is GHC 9.8 compatible, you could open a PR to update bounds in proto-lens itself. But since the last release for GHC 9.6 support was in December... I would not expect this to take too long to get responded to. It certainly doesn't look like it's unmaintained.

I guess for GHC 9.8 support it's just a matter of time.

What about the other issues mentioned above? Were you able to make progress on those?

Mar 30 '24 17:03 wolfgangwalther

The recent problem I'm seemingly stuck with is hs-opentelemetry is using UnliftIO, which seems not quite composable with our (implicit, correct?) monad stack. So the deeper into the call stack the instrumented code is (the one I'm trying to wrap with inSpan), the more ridiculously complex it should be changed to be instrumented, i.e. https://github.com/PostgREST/postgrest/pull/3140/files#diff-5de3ff2b2d013b33dccece6ead9aeb61feffeb0fbd6e38779750511394cf9701R156-R157, up to the point I have no idea how to proceed further (e.g. wrapping App.handleRequests cases with their own spans, which is semantically correct)

There's a more straightforward MonadIO-involving opentelemetry library, with less activity and quite different approach to the telemetry data export (GHC eventlog → file/pipe by the GHC runtime). It looks less invasive approach, refactoring-wise, but requires more hoops to jump to actually deliver traces to Honeycomb/Lightstep/whatnot (pull eventlog → convert it to zipkin/jaeger/b3 → upload somewhere for analysis).

It also seems to boil down to the conceptual choice between online and offline traces' delivery-wise, or push and pull model.

@steve-chavez @wolfgangwalther @laurenceisla what do you think guys?

In my prototype I actually played with replacing HASQL Session with an https://github.com/haskell-effectful/effectful based monad to make it extensible:

https://github.com/mkleczek/hasql-api/blob/master/src/Hasql/Api/Eff/Session.hs#L37

Using it in PostgREST required some mixins usage in Cabal:

https://github.com/PostgREST/postgrest/commit/29b946ed864cf3ceffce2f692cdd76ddd169a771#diff-eb6a76805a0bd3204e7abf68dcceb024912d0200dee7e4e9b9bce3040153f1e1R140

Some work was required in PostgREST startup/configuration code to set-up appropriate effect handlers and middlewares but the changes were quite well isolated.

At the end of the day I think basing your monad stack on an effect library (effectful, cleff etc.) is the way forward as it makes the solution highly extensible and configurable.

Apr 04 '24 04:04 mkleczek

postgrest postgrest copied to clipboard

feat: produce OpenTelemetry traces with `hs-opentelemetry`

TODO:

Running:

Tests

postgrest
postgrest copied to clipboard