apm-agent-java
apm-agent-java copied to clipboard
instrument JDBC fetches
Is your feature request related to a problem? Please describe. We are currently running SQLs from germany against a database in another country. We had performance problems and didn't know where they came from so we looked at the APM UI in Kibana. Unfortunately, this was not helpful at all as the SQL finished in less than a second but the whole operation took 4 minutes. After digging into the code and narrowing the problem down it seems the 4 minutes are taken by the fetches to the database which is currently not shown by APM. After changing the fetchSize we were down to 7 seconds. Is it possible to show the JDBC fetches in the APM module?
Describe the solution you'd like I would like that the fetches would be represented in the SQL Span but I guess this opens problems for others who want to see only the SQL execution. Maybe we can get a new Span for fetches where all fetches for an SQL are aggregated?
Currently, it's not possible but it's a great idea to add it. The problem is that AFAIK, there's no defined API we can instrument for "get next batch of items". ResultSet#next
can either be answered from the already requested batch or it may result in a network call. One option would be to only create a span if the call to ResultSet#next
took more than 5ms, for example.
After some thought, I think that if implemented, this should be optional and disabled by default.
ResultSet is an abstraction for a cursor. From tracing perspective, it's extremely difficult to handle it properly. Assuming that cursor moves only forward and is used only for reading is oversimplification.
Assumption that performance could be analyzed for reads using ResultSet abstraction for majority of use-cases is wrong. Considering that fetch size controls both amount of transferred data and number of cursor operations, decreasing fetch size makes single invocation of ResultSet#next operation appear faster. The trickery is that total processing time is likely to increase when fetch size is decreased (due to latency, which can be sub-milliseconds). Considering this, it would be extremely hard diagnose query performance from just ResultSet#next, especially if you filter out some invocations.
What if you would trace the time it took from aquiring a ResultSet to closing it? This would be duration that cursor was open. If you added cursor type, number of cursor operations to span as meta data, one could probably be able to get insight of traced query. Also, you should not forget to add isolation level 🙂
@skorhone This is what we already did as a workaround and this is what pointed us to the fetch at last. We added the CaptureSpan annotation to the DAO function to get the time the complete db communcation required and added labels to see additional information. I understand that this is not a common usecase but I still believe that in the time of global companies there are a lot of users who would benefit from this feature as data must be transferred between countries somehow. Having said that I would be okay with closing this issue if you think that this is not worth investing time as we have a working workaround which we can use.
What if you would trace the time it took from aquiring a ResultSet to closing it? Sounds like a good idea in general but there are some complications: There could be spans in between and the result set may be closed in a different thread.
Another option would be to create a metric for the duration of all ResultSet#next
calls. The disadvantage is that you can't attribute it to a particular transaction or span.
Yet another idea would be to collect the total duration of the ResultSet#next
calls at the transaction level which makes it possible to attribute that time into the db
part of the breakdown metrics. Tracking that on the actual DB span is a bit tricky because the span might already be sent to the APM server at the time ResultSet#next
gets called.
There is actually no guarantee that cursor is closed on commit either. I guess the correct way would be to create a span for each cursor operation, but I can't see that it would work in real world.
What if cursor was given its' own transaction?
With the release of span compression I would like to get the discussion here started again: When span compression is enabled all similar spans that are adjacient to each other are compressed into a single span. So
- Select
- Fetch
- Fetch
- ...
would become:
- Select
- Fetch (100x)
- ...
So I guess it would be possible to instrument fetch
without cluttering the UI and still having the total fetch time per SQL.
What do you think? Would it be feasible to only enable this instrumentation when span compression is enabled?