dd-trace-js
dd-trace-js copied to clipboard
feat(plugin-pg): instrument Client.connect and Pool.connect
What does this PR do?
Extends the pg instrumentation to include Client.connect, native.Client.connect, and Pool.connect.
Motivation
Fixes https://github.com/DataDog/dd-trace-js/issues/1923 I had similar pain with diagnosing a db bottleneck with full db connection pools. Having these spans would have made identifying the issue straight forward.
Plugin Checklist
- [x] Unit tests.
- [x] Plugin is exported.
Additional Notes
Wrapping the connect method of both the Client and Pool classes allows for diagnosing two separate delays for a query: the pool being full vs establishing a new network connection.
The failing benchmark action is caused by Bad credentials from CircleCI to GH, so I'm going to ignore it and call this ready for review.
fyi @alexstrat
Thanks for the PR! We don't usually trace things happening outside of a transaction. For example, if the connection happens because a query was made, then this is something we'd want to trace, but not for something like a connection that always happens when the process starts outside of any request. This is to avoid noise that happens at the process level and is not impacting any user of the service.
Could the PR be limited to this type of connection? Would that still be useful for your use case?
@rochdev pg uses lazy connections, so these functions wouldn't be called unless they were associated with a query (or user-code called connect directly, which isn't the norm).
Client.connect is called when the lib needs to establish an actual network connection with the psql server in order to transmit a sql statement.
Pool.connect is aways called from Pool.query and handles the logic of acquiring a live Client. Ether by grabbing an idle client, creating a new one, or waiting for the pool to not be full.
Fwiw @rochdev we'd love to see this land, b/c we're currently looking at GraphQL traces that have "gaps" of no activity, and one of our current theories is that the request is waiting for a connection from the connection pool.
Disclaimer I'm not 100% sure what the output of this PR looks like, but if we could see spans for "waiting for an available connection from the connection pool" in our traces, that would be amazing.
Thanks @mastermatt !
@stephenh the way this is written, instead of a single span for a db query like there is today, you'd have two spans on top of each other. If the two spans start at the same time then there was no time waiting to get a new connection. But if there is a delay between the first span and the second span starting then that is time where the pool is waiting for a client to free up.
You have to think of these spans as nested function calls. With the way pg is written it would be hard/impossible to have a span just for the time the pool is waiting for a free client.
@mastermatt cool, that makes sense! We'll probably try this out in production, hopefully in the next ~few weeks (it's bothering us but not super high priority), and will report back how it goes. Thanks!
@rochdev Given the apm:$module:$action:$event pattern our channels are following, should the pool related ones be like apm:pg:pool-connect:... rather than apm:pg-pool:connection:...? Similar thought on the span names.
Note that pg-pool is its own module. https://www.npmjs.com/package/pg-pool
Ah, missed that. Nevermind then. :)
@rochdev I'm not sure if you're the right person to ping anymore, but we've come across another scenario where having these metrics would've be invaluable.
If I go through and update this PR, is any desire from the maintainers to support this feature?
@rochdev I'm not sure if you're the right person to ping anymore, but we've come across another scenario where having these metrics would've be invaluable.
If I go through and update this PR, is any desire from the maintainers to support this feature?
Hitting a major issue related to this right now. Would love to get this feature!
Yes, feel free to update the PR and we can have another look at it. 😃