dd-trace-js icon indicating copy to clipboard operation
dd-trace-js copied to clipboard

feat(plugin-pg): instrument Client.connect and Pool.connect

Open mastermatt opened this issue 3 years ago • 15 comments

What does this PR do?

Extends the pg instrumentation to include Client.connect, native.Client.connect, and Pool.connect.

Motivation

Fixes https://github.com/DataDog/dd-trace-js/issues/1923 I had similar pain with diagnosing a db bottleneck with full db connection pools. Having these spans would have made identifying the issue straight forward.

Plugin Checklist

  • [x] Unit tests.
  • [x] Plugin is exported.

Additional Notes

Wrapping the connect method of both the Client and Pool classes allows for diagnosing two separate delays for a query: the pool being full vs establishing a new network connection.

mastermatt avatar Apr 02 '22 05:04 mastermatt

The failing benchmark action is caused by Bad credentials from CircleCI to GH, so I'm going to ignore it and call this ready for review.

mastermatt avatar Apr 04 '22 13:04 mastermatt

fyi @alexstrat

mastermatt avatar Apr 04 '22 13:04 mastermatt

Thanks for the PR! We don't usually trace things happening outside of a transaction. For example, if the connection happens because a query was made, then this is something we'd want to trace, but not for something like a connection that always happens when the process starts outside of any request. This is to avoid noise that happens at the process level and is not impacting any user of the service.

Could the PR be limited to this type of connection? Would that still be useful for your use case?

rochdev avatar Apr 25 '22 16:04 rochdev

@rochdev pg uses lazy connections, so these functions wouldn't be called unless they were associated with a query (or user-code called connect directly, which isn't the norm).

Client.connect is called when the lib needs to establish an actual network connection with the psql server in order to transmit a sql statement. Pool.connect is aways called from Pool.query and handles the logic of acquiring a live Client. Ether by grabbing an idle client, creating a new one, or waiting for the pool to not be full.

mastermatt avatar Apr 25 '22 20:04 mastermatt

Fwiw @rochdev we'd love to see this land, b/c we're currently looking at GraphQL traces that have "gaps" of no activity, and one of our current theories is that the request is waiting for a connection from the connection pool.

Disclaimer I'm not 100% sure what the output of this PR looks like, but if we could see spans for "waiting for an available connection from the connection pool" in our traces, that would be amazing.

Thanks @mastermatt !

stephenh avatar Feb 06 '23 21:02 stephenh

@stephenh the way this is written, instead of a single span for a db query like there is today, you'd have two spans on top of each other. If the two spans start at the same time then there was no time waiting to get a new connection. But if there is a delay between the first span and the second span starting then that is time where the pool is waiting for a client to free up. You have to think of these spans as nested function calls. With the way pg is written it would be hard/impossible to have a span just for the time the pool is waiting for a free client.

mastermatt avatar Feb 06 '23 22:02 mastermatt

@mastermatt cool, that makes sense! We'll probably try this out in production, hopefully in the next ~few weeks (it's bothering us but not super high priority), and will report back how it goes. Thanks!

stephenh avatar Feb 06 '23 22:02 stephenh

@rochdev Given the apm:$module:$action:$event pattern our channels are following, should the pool related ones be like apm:pg:pool-connect:... rather than apm:pg-pool:connection:...? Similar thought on the span names.

Qard avatar Feb 10 '23 20:02 Qard

Note that pg-pool is its own module. https://www.npmjs.com/package/pg-pool

mastermatt avatar Feb 10 '23 21:02 mastermatt

Ah, missed that. Nevermind then. :)

Qard avatar Feb 10 '23 23:02 Qard

@rochdev I'm not sure if you're the right person to ping anymore, but we've come across another scenario where having these metrics would've be invaluable.

If I go through and update this PR, is any desire from the maintainers to support this feature?

mastermatt avatar Oct 12 '23 21:10 mastermatt

@rochdev I'm not sure if you're the right person to ping anymore, but we've come across another scenario where having these metrics would've be invaluable.

If I go through and update this PR, is any desire from the maintainers to support this feature?

Hitting a major issue related to this right now. Would love to get this feature!

chasdevs avatar Dec 19 '23 18:12 chasdevs

Yes, feel free to update the PR and we can have another look at it. 😃

Qard avatar Jan 02 '24 12:01 Qard