Add Couch Stats Resource Tracker (CSRT)

Open chewbranca opened this issue 9 months ago • 0 comments

Couch Stats Resource Tracker (CSRT)

Couch Stats Resource Tracker (CSRT) is a framework for tracking the metrics induced in couch_stats at the process level to understand what requests and operations are utilizing the underlying system resources. The metrics in couch_stats are currently tracked at the node level allowing you to see the quantitative aspects of node resource consumption, but lack the qualitative information to understand what induced those workloads. CSRT rectifies this by way of process local real time metrics collection exposed in a queryable interface and also by way of generating process lifetime level reports documenting the total quantity and time of work induced by a given request.

This PR takes a different approach than mango_stats and other facilities that look to embed additional metrics that are not present in couch_stats; rather, the approach here is inverting this notion with the core idea that: if it's worth tracking the consumption of a specific resource then that resource is worth tracking by way of dedicated couch_stats metrics. So when the need for additional data collection arises, we should add those as node level stats in couch_stats and then track those at the process level by way of CSRT.

This is a rework of PR: https://github.com/apache/couchdb/pull/4812

This is a singular PR comprised of a few core components:

the underlying real time stats collection framework
the mechanisms for tracking the resource operations
the querying and logging facilities

The distinction is drawn between these components to highlight that the core system here is the underlying process level real time stats collection framework that allows tracking, querying, and logging of resource usage induced by requests to and operations within CouchDB. The current requests and operations tracked along with the mechanisms for querying and logging are a solid start but by no means complete and the idea is to build upon the core framework of 1) to introspect resource usage such that a user of CouchDB can see what requests and operations are comprising their system usage.

Understanding what requests utilize the given resources is incredibly important for aggregate operations that require consuming far more resources than they manifest back in the client response, as we currently lack vision into these workload discrepancies. For instance, if you have a _find query that falls back to the all docs index and is a query for which no rows will be found, that singular http request will induce a full database scan of all documents, painful enough on its own but magnified greatly when induced in parallel. These types of requests usually manifest at the client level as timeouts, and within the databse as heavy IO reader proceses that can thundering herd database shards. Filtered changes feeds are similar, but even worse in that they funnel the full database scan through the Javascript engine pipeline. At least with views the workload induced on the system is roughly similar to what's returned to the client, which provides at least some mechanism for the user to understand what's going on; eg if you query 70 million rows from a view in a singular http request that's something that you'll be able to see and realize it's obviously problematic. CSRT resolves these issues by logging heavy duty requests as reports allowing for post facto analysis of what the database was doing over a given time period, and also a real time querying system for finding out what the hot requests are right now.

1) the underlying real time stats collection framework

The previous PR details out a few different approaches that failed to scale effectively during my testing. Instead, the approach in this PR builds upon the signifcant improvents in Erlang made to ETS around atomic updates and decentralized counters, combined with an approach that performs no concurrent writes to the same key as all tracked processes directly update the central stats table by way of ets:update_counter which performs atomic and isolated updates. This approach combined with read_concurrency and write_concurrency allows for a highly concurrent data collection mechanism that still allows real time querying. It's easy enough to track the process local stats and generate a report at the end of the process lifetime, but given that workloads induced from a request can potentially last for hours, it's critical to have a queryable real time mechanism that allows for introspection of these long running tasks, especially if we want to incorporate further information from long running background jobs like indexing, replication, and compaction.

2) the mechanisms for tracking the resource operations

CSRT hooks into couch_stats:increment_counter from within the caller process and in the event the metric counter being incremented is one tracked by CSRT we then update the ETS entry for the caller process for the given stat. This greatly simplifies the approach of stats collections, avoids having a centralized set of processes gathering and publishing stats, and avoids concurrent writes to the same key given each process tracks itself.

As mentioned above, the key of this PR is the core framework for stats collection and funneling of data between nodes with a limited subset of operations being tracked by way of CSRT. Currently we track coordinator processes as the http request flows into chttpd, then we track RPC workers by way of rexi_server:init_p. As the RPC worker processes send messages back by way of rexi they also embed a snapshot of the current workload delta since the last rexi message sent, allowing for the coordinator process to accumulate and reflect the full workload induced, which is then queryable. This mechanism allows for rexi:ping to keep deltas flowing, so even when RPC workers are not sending data back to the client, they will funnel along their CSRT stats so we can easily find the heavy processes. This PR intentionally keeps the focus to the coordinator processes and rexi_server based RPC workers, but the expectation is that this tracking system can be utilized for background job workers and the other RPC subsystems like in dreyfus and mem3.

We specifically track RPC workers spawned by way of rexi_server:init_p, but we're less specific about the underlying RPC operation called, and rather, we track whether the RPC process induces metrics increments for one of the metrics CSRT is interested in tracking. This goes back to the philosophy of CSRT, which is to track the important things we already track in couch_stats, so rather than specifically counting the number of documents opened in an _all_docs requests, we instead track the stat [couchdb, database_reads] for all RPC workers induced by rexi_server:init_p and then we can find out what the actual request that induced the workload was by way of the generated log report.

3) the querying and logging facilities

There's a rudimentary http based querying system that allows for counting, grouping, and sorting on different fields of the real time values of the running processes. There's also a process lifetime logging facility that allows for customizable filtering of what process lifetimes to log. Both of these are capable but rough and could use help.

The CSRT logger utilizes ets:fun2ms to create powerful and performant filtering functions to determine whether to log the process lifetime report of a process once it has concluded its operations. These filter match specs are compiled and saved as a persistent term to be readily available to all processes in the system, which allows tracking processes to monitor the lifetime of coordinators or worker processes and upon their exit load the precompiled match spec and locally determine whether or not to generate a log. This distributed approach avoids centralized process trackers, a major overload failure mode of previous experiments, while utilizing the significant performance gains of persistent_term and even avoids incurring heavy copying of values into the local caller process as the compiled match specs returned out of persistent term are actually just refs to an internal representation!

Currently there are a handful of builtin matchers that are configurable by way of the ini files, for example, filters on requests that perform more than X docs read, or IOQ calls, or docs written. There's also a direct matcher for dbname, or a more specific dbname IO matcher that matches when a request to the provided dbname does more than X ops to any of the core IO metrics.

These are some baseline matchers that I thought would be useful, but it would be great to get ideas from more folks on what they would find useful. These matchers are easy to add as functions or dynamically by way of remsh, but I haven't come up with a great way to declare the more complicated config chains in ini files. I'm hoping other folks might have some ideas on this front, for example, the following matcher is easy to write and dynamically register to perform filters, but I couldn't come up with a great approach to allow for specifying these types of complex queries in an ini file:

ets:fun2ms(fun(#rctx{dbname=<<"foo/bar">>, ioq_calls=IOQ, docs_read=Docs,
rows_read=Rows, type=#worker{}} = R) when (IOQ > 1234) or (Docs > 203) or (Rows >
500) -> R end).

The csrt_logger server will allow you to dynamically register match specs like that to then generate process lifetime reports for processes that exit and whose workloads matched those filter thresholds. This same matcher can also be run directly against the ETS table to allow for sophisticated real time filtering by way of match specs. Or you can even run the matcher against a list of #rctx{} values, which is basically what csrt_logger:tracker does.

I'm quite excited for the dynamic logging and introspection capabilities afforded by the match spec filtering mechanisms, but I am scratching my head a bit coming up with a good approach for the combinatorics of different filter pairings of the various fields. I've added a handful of builtin matchers in this PR, and I think we can provide a useful set of "slow query" type basic filters from the get go. Hopefully someone can come up with an expressive way of chaining the matcher specs, but might be challenging given it's doing parse transforms and what not. That said, it's entirely possible to write some fairly verbose matchspec funs for various more complex matchers we want, I don't think that's a problem for useful filters we expect to stick around.

TODO

Add some concluding thoughts, a few examples of output, and some requested review dissussion points.

Mar 25 '25 16:03 chewbranca