sqlx icon indicating copy to clipboard operation
sqlx copied to clipboard

Observability improvements

Open ThomWright opened this issue 2 years ago • 3 comments

Context

Running an sqlx application in production, there are several outputs I would find really useful to make my application more observable.

These could be three separate issues, but I'm going to file them all together for now for convenience.

I haven't put any thought into implementation or APIs. I'd first like to find out whether this is something y'all would consider including in sqlx. If so, I'm happy to do some thinking about APIs and internals.

I also haven't looked at what similar libraries do. I've previously wrapped pg in Node.js to achieve these same results, but it might be worth taking a wider look at the ecosystem for inspiration.

RED metrics for queries and transactions

RED (Rate, Errors, Duration) metrics would tell us how many queries/transactions we're executing per second, how long they take, and how many of them error.

Being able to create histograms could be useful, but maybe unnecessarily complex. Instead, the following might suffice:

  • Rate – a count of queries/transactions executed
  • Errors – a count of errors returned from queries/transactions
  • Duration – a count of [milliseconds|microseconds] spent executing queries

It could be useful to break these down by query somehow, but I haven't put any thought into this.

Traces for queries and transactions

This has been mentioned in #942, specifically here.

For those using tracing, support for tracing both transactions and queries would be great.

An example illustrating what I have in mind:

Tracing example

EDIT: I've seen it suggested that e.g. connecting and acquiring connections from pools could be spans too.

USE metrics for connection pools

Database connection pools can be a common point of saturation under load, affecting latency and sometimes availability. This is especially true with PostgreSQL (as opposed to e.g. MySQL), where the number of connections is fairly constrained. This makes it fairly important to be able to get some observability around pool resources.

Specifically, it would be really useful to expose USE (Utilisation, Saturation, Errors) metrics for sqlx::Pool.

I'm thinking something along the lines of:

  • Utilisation – a count of [milliseconds|microseconds] connections have spent active and in use, but not necessarily executing queries
  • Saturation – a count of [milliseconds|microseconds] queries have spent waiting for an available connection
  • Errors – a count of connection errors returned (e.g. PoolTimedOut)

The utilisation metric could be sampled currently by using max_connections, Pool::size() and Pool::num_idle(). This would tell us utilisation for a given instant. But this is less powerful than being able to compute average utilisation over time.

As an example using PromQL, we could calculate average per-minute utilisation using something like:

rate(sqlx_connection_active_used_seconds_total[1m]) / sqlx_max_connections

Further reading

ThomWright avatar Jun 06 '22 17:06 ThomWright

I've just found #1860 and #1900, which is great!

For anyone interested, there's some good discussion in there about implementing a connection acquisition wait metric (for pool saturation).

ThomWright avatar Jun 12 '22 19:06 ThomWright

Are they any alternatives? I search for a way to add some query metrics (similiar to the described RED metrics). Is there a "workaround" until this is implemented in SQLx? I also use Sea-ORM but I haven't found anything their either.

DenuxPlays avatar Mar 20 '24 10:03 DenuxPlays