atlasdb Add statistics to spans

Goals (and why):

Teams using AtlasDB can diagnose service-level statistics, but trace-level usage information needs to be reconstructed indirectly. This PR is the first step to create easy-to-digest statistics to drive optimization.

==COMMIT_MSG== Add statistics to spans emitted from within atlasdb. The following tags will be added to spans:

atlasdb.emptyReads: The number of AtlasDB tombstones read during this span.
atlasdb.bytesRead: An estimate of the number of bytes read during this span from the underlying database. This is currently the number of column name & value bytes read, and will not contain various overheads. It should be used seen as a value correlated with the real value.

Note that both these statistics are best-effort. There is no guarantee of completeness, and require correct usage of AtlasDB to work (e.g. closing ClosableIterators). ==COMMIT_MSG==

Implementation Description (bullets):

Generally a relatively simple system using thread locals.
The bulk of the PR is to attempt to push ClosableIterators through the codebase & attempt to track them more cleanly. This helps us handle async iterators.

Limitations:

The statistics can only be kept within reasonable bounds; async iterators need to be consumed from the same thread as the one that created it, or use PTExecutors.wrap() around the callable/... that gets executed on a different thread. This limitation is relatively fundamental unless we're willing to e.g. pass around a context object within the KVS code.

Testing (What was existing testing like? What have you done to improve it?):

TODO

Concerns (what feedback would you like?):

Are there hard-to-see risks associated with this code?
How misleading do these become if e.g. the limitations are hit? Given that we have the full statistics available for bytesRead, we might be able to see gaps, but they won't always be easy to spot.
Do we want to keep the @MustBeClosed annotations? They can be rolled back, but it increases the risk of bad wrapping (where we drop the inner ClosableIterator). We already have very few guarantees that close will be called as-is.

Where should we start reviewing?:

TraceStatistics
KeyValueService -> The reason for @MustBeClosed progression within the code

Priority (whenever / two weeks / yesterday):

Jun 21 '22 03:06 raiju

Generate changelog in `changelog/@unreleased`

Type

[ ] Feature
[ ] Improvement
[x] Fix
[ ] Break
[ ] Deprecation
[ ] Manual task
[ ] Migration

Description Add statistics to spans emitted from within atlasdb. The following tags will be added to spans:

atlasdb.emptyReads: The number of AtlasDB tombstones read during this span.
atlasdb.bytesRead: An estimate of the number of bytes read during this span from the underlying database. This is currently the number of column name & value bytes read, and will not contain various overheads. It should be used seen as a value correlated with the real value.

Note that both these statistics are best-effort. There is no guarantee of completeness, and require correct usage of AtlasDB to work (e.g. closing ClosableIterators).

Check the box to generate changelog(s)

[x] Generate changelog entry

Jun 21 '22 03:06 changelog-app[bot]

Looking for a review of the general approach before I go back to clean up the PR itself. The main points I would love to have covered are:

Is this the best/most reliable point to inject on? The (much more reliable) alternative would be to thread some statistics system throughout the whole codebase, but I don't fully understand how much havoc that would cause on roll-out; do people rely on the internal public APIs?)
This has some fun/funky edge-cases in terms of threading, meaning there will be services/usage patterns where the results are wrong; do we have a sense on how much reading of async iterators happens on a thread different from the original invoker? What do we want to happen when that does happen?
What do we want to happen on parallel async reads? The current code can't attribute reads/bytes correctly. The trace level metrics will be correct though, so I feel this is a case of "don't let perfect be the enemy of the good"

Jul 19 '22 14:07 raiju

@jeremyk-91 Fixed the overflow issue; think it's ready for CR when you get a chance.

Aug 08 '22 19:08 raiju

If I'm reading the C* code right, the getRange does end up using the ResultExtractor; through:

CassandraKeyValueServiceImpl:1346 (getRange)
RangeLoader:53 (getRange)
RangeLoader:74 (getRangeWithPageCreator)
RangeLoader:95 (getRangeWithPageCreator)
through AbstractPagingIterable.iterator
CassandraRangePagingIterable:69/75 (getFirstPage/getNextPage)
CassandraRangePagingIterable:81 (getSinglePage)
CassandraRangePagingIterable:103 (getPage)
ResultsExtractor:84

Where the current counters are; that said, I'm not entirely sure how exactly the column/row/value maps to network bandwidth (as in, are things deduped?)

Aug 15 '22 18:08 raiju

Oh, nvm, getRowsColumnRange doesn't; it's a bit surprising to me that we have multiple paths to turn bytes into values/columns/rows

Aug 15 '22 18:08 raiju

@jeremyk-91 I added support for getRowsColumnRange for Cassandra & getRange for DBKVS. Yes, I think we should b good to merge (assuming I didn't do anything supremely silly in my current PR). We can always improve accuracy later.

Note: I've refrained from putting things at the lowest level (at the DB layer) to ensure we don't accidentally make hidden assumptions (e.g. all C* bytebuffers are backed by arrays so are cheap to re-read; I don't think we should assume that). We could also get trivially more correctness by creating a wrapper around TBinaryProtocol.

Aug 15 '22 18:08 raiju

I've been busy with planning and some team commitments, but will pick this up today or tomorrow. Apologies :(

Aug 22 '22 16:08 jeremyk-91

Released 0.689.0

Aug 22 '22 20:08 svc-autorelease

atlasdb atlasdb copied to clipboard

Add statistics to spans

Generate changelog in changelog/@unreleased

atlasdb
atlasdb copied to clipboard

Generate changelog in `changelog/@unreleased`