couchdb icon indicating copy to clipboard operation
couchdb copied to clipboard

Couch stats resource tracker v3 rebase main

Open chewbranca opened this issue 5 months ago • 10 comments

This PR supercedes https://github.com/apache/couchdb/pull/5491 and includes @iilyak's excellent HTTP updates on top of it, as well as some final cleanup and documentation from me. I've copied the contents of CSRT.md into the PR description here:

Couch Stats Resource Tracker (CSRT)

CSRT (Couch Stats Resource Tracker) is a real time stats tracking system that tracks the quantity of resources induced at the process level in a live queryable manner that also generates process lifetime reports containing statistics on the total resource load of a request, as a function of things like dbs/docs opened, view and changes rows read, changes returned vs processed, Javascript filter usage, duration, and more. This system is a paradigm shift in CouchDB visibility and introspection, allowing for expressive real time querying capabilities to introspect, understand, and aggregate CouchDB internal resource usage, as well as powerful filtering facilities for conditionally generating reports on "heavy usage" requests or "long/slow" requests. CSRT also extends recon:proc_window with csrt:proc_window allowing for the same style of battle hardened introspection with Recon's excellent proc_window, but with the sample window over any of the CSRT tracked CouchDB stats!

CSRT does this by piggy-backing off of the existing metrics tracked by way of couch_stats:increment_counter at the time when the local process induces those metrics inc calls, and then CSRT updates an ets entry containing the context information for the local process, such that global aggregate queries can be performed against the ets table as well as the generation of the process resource usage reports at the conclusions of the process's lifecyle.The ability to do aggregate querying in realtime in addition to the process lifecycle reports for post facto analysis over time, is a cornerstone of CSRT that is the result of a series of iterations until a robust and scalable aproach was built.

The real time querying is achieved by way of a global ets table with read_concurrency, write_concurrency, and decentralized_counters enabled. Great care was taken to ensure that zero concurrent writes to the same key occure in this model, and this entire system is predicated on the fact that incremental updates to ets:update_counters provides really fast and efficient updates in an atomic and isolated fashion when coupled with decentralized counters and write concurrency. Each process that calls couch_stats:increment_counter tracks their local context in CSRT as well, with zero concurrent writes from any other processes. Outside of the context setup and teardown logic, only operations to ets:update_counter are performed, one per process invocation of couch_stats:increment_counter, and one for coordinators to update worker deltas in a single batch, resulting in a 1:1 ratio of ets calls to real time stats updates for the primary workloads.

The primary achievement of CSRT is the core framework iself for concurrent process local stats tracking and real time RPC delta accumulation in a scalable manner that allows for real time aggregate querying and process lifecycle reports. This took several versions to find a scalable and robust approach that induced minimal impact on maximum system throughput. Now that the framework is in place, it can be extended to track any further desired process local uses of couch_stats:increment_counter. That said, the currently selected set of stats to track was heavily influenced by the challenges in reotractively understanding the quantity of resources induced by a query like /db/_changes?since=$SEQ, or similarly, /db/_find.

CSRT started as an extension of the Mango execution stats logic to _changes feeds to get proper visibility into quantity of docs read and filtered per changes request, but then the focus inverted with the realization that we should instead use the existing stats tracking mechanisms that have already been deemed critical information to track, which then also allows for the real time tracking and aggregate query capabilities. The Mango execution stats can be ported into CSRT itself and just become one subset of the stats tracked as a whole, and similarly, any additional desired stats tracking can be easily added and will be picked up in the RPC deltas and process lifetime reports.

CSRT Config Keys

-define(CSRT, "csrt").

config:get("csrt").

Primary CSRT config namespace: contains core settings for enabling different layers of functionality in CSRT, along with global config settings for limiting data volume generation.

-define(CSRT_MATCHERS_ENABLED, "csrt_logger.matchers_enabled").

config:get("csrt_logger.matchers_enabled").

Config toggles for enabling specific builtin logger matchers, see the dedicated section below on # CSRT Default Matchers.

-define(CSRT_MATCHERS_THRESHOLD, "csrt_logger.matchers_threshold").

config:get("csrt_logger.matchers_threshold").

Config settings for defining the primary Threshold value of the builtin logger matchers, see the dedicated section below on # CSRT Default Matchers.

-define(CSRT_MATCHERS_DBNAMES, "csrt_logger.dbnames_io").

config:get("csrt_logger.matchers_enabled").

Config section for setting $db_name = $threshold resulting in instantiating a "dbname_io" logger matcher for each $db_name that will generate a CSRT lifecycle report for any contexts that that induced more operations on any one field of ioq_calls|get_kv_node|get_kp_node|docs_read|rows_read that is greater than $threshold and is on database $db_name.

This is basically a simple matcher for finding heavy IO requests on a particular database, in a manner amenable to key/value pair specifications in this .ini file until a more sophisticated declarative model exists. In particular, it's not easy to sequentially generate matchspecs by way ets:fun2ms/1, and so an alternative mechanism for either dynamically assembling an #rctx{} to match against or generating the raw matchspecs themselves is warranted.

-define(CSRT_INIT_P, "csrt.init_p").

config:get("csrt.init_p").

Config toggles for tracking counters on spawning of RPC fabric_rpc workers by way of rexi_server:init_p. This allows us to conditionally enable new metrics for the desired RPC operations in an expandable manner, without having to add new stats for every single potential RPC operation. These are for the individual metrics to track, the feature is enabled by way of the config toggle config:get(?CSRT, "enable_init_p"), and these configs can left alone for the most part until new operations are tracked.

CSRT Code Markers

-define(CSRT_ETS, csrt_server).

This is the reference to the CSRT ets table, it's managed by csrt_server so that's where the name originates from.

-define(MATCHERS_KEY, {csrt_logger, all_csrt_matchers}).

This marker is where the active matchers are written to in persisten_term for concurrently and parallelly and accessing the logger matchers in the CSRT tracker processes for lifecycle reporting.

CSRT Process Dictionary Markers

-define(PID_REF, {csrt, pid_ref}).

This marker is for the core storing the core PidRef identifier. The key idea here is that a lifecycle is a context lifecycle is contained to within the given PidRef, meaning that a Pid can instantiate different CSRT lifecycles and pass those to different workers.

This is specifically necessary for long running processes that need to handle many CSRT context lifecycles over the course of that individual process's lifecycle independent. In practice, this is immediately needed for the actual coordinator lifecycle tracking, as chttpd uses a worker pool of http request handlers that can be re-used, so we need a way to create a CSRT lifecycle corresponding to the given request currently being serviced. This is also intended to be used in other long running processes, like IOQ or couch_js pids such that we can track the specific context inducing the operations on the couch_file pid or indexer or replicator or whatever.

Worker processes have a more clear cut lifecycle, but either style of process can be exit'ed in a manner that skips the ability to do cleanup operations, so additionally there's a dedicated tracker process spawned to monitor the process that induced the CSRT context such that we can do the dynamic logger matching directly in these tracker processes and also we can properly cleanup the ets entries even if the Pid crashes.

-define(TRACKER_PID, {csrt, tracker}).

A handle to the spawned tracker process that does cleanup and logger matching reprots at the end of the process lifecycle. We store a reference to the tracker pid so that for explicit context destruction, like in chttpd workers after a request has been serviced, we can update stop the tracker and perform the expected cleanup directly.

-define(DELTA_TA, {csrt, delta_ta}).

This stores our last delta snapshot to track progress since the last incremental streaming of stats back to the coordinator process. This will be updated after the next delta is made with the latest value. Eg this stores T0 so we can do T1 = get_resource() make_delta(T0, T1) and then we save T1 as the new T0 for use in our next delta.

-define(LAST_UPDATED, {csrt, last_updated}).

This stores the integer corresponding to the erlang:monotonic_time() value of the most recent updated_at value. Basically this lets us utilize a pdict value to be able to turn update_at tracking into an incremental operation that can be chained in the existing atomic ets:update_counter and ets:update_element calls.

The issue being that our updates are of the form +2 to ioq_calls for $pid_ref, which ets does atomically in a guaranteed atomic and isolated manner. The strict use of the atomic operations for tracking these values is why this system works effeciently at scale. This means that we can increment counters on all of the stats counter fields in a batch, very quickly, but for tracking updated_at timestamps we'd need to either do an extra ets call to get the last updated_at value, or do an extra ets call to ets:update_element to set the updated_at value to csrt_util:tnow(). The core problem with this is that the batch inc operation is essentially the only write operation performed after the initial context setting of dbname/handler/etc; this means that we'd literally double the number of ets calls induced to track CSRT updates, just for tracking the updated_at. So instead, we rely on the fact that the local process corresponding to $pid_ref is the only process doing updates so we know the last updated_at value will be the last time this process updated the data. So we track that value in the pdict and then take a delta between tnow() and updated_at, and then updated_at becomes a value we can sneak into the other integer counter updates we're already performing!

Primary Config Toggles

CSRT (?CSRT="csrt") Config Settings

config:get(?CSRT, "enable", false).

Core enablement toggle for CSRT, defaults to false. Enabling this setting intiates local CSRT stats collection as well as shipping deltas in RPC responses to accumulate in the coordinator.

This does not trigger the new RPC spawn metrics, and it does not enable reporting for any of the rctx types.

NOTE: you MUST have all nodes in the cluster running a CSRT aware CouchDB before you enable it on any node, otherwise the old version nodes won't know how to handle the new RPC formats including an embedded Delta payload.

config:get(?CSRT, "enable_init_p", false).

Enablement of tracking new metric counters for different fabric_rpc operations types to track spawn rates of RPC work induced across the cluster. There is corresponding config lookups into the ?CSRT_INIT_P namespace for keys of the form: atom_to_list(Mod) ++ "__" atom_to_list(Fun), eg "fabric_rpc__open_doc" for enabling the specific RPC endpoints.

However, those individual settings can be ignored and this top level config toggle is what should be used in general, as the function specific config toggles predominantly exist to enable tracking a subet of total RPC operations in the cluster, and new endpoints can be added here.

config:get(?CSRT, "enable_reporting", false).

This is the primary toggle for enabling CSRT process lifetime reports containing detailed information about the quantity of work induced by the given request/worker/etc. This is the top level toggle for enabling any reporting, and there also exists config:get(?CSRT, "enable_rpc_reporting", false). to disable the reporting of any individual RPC workers, leaving the coordinator responsible of generating a report with the accumulated deltas.

config:get(?CSRT, "enable_rpc_reporting", false).

This enables the possibility of RPC workers generating reports. They still need to hit the configured thresholds to induce a report, but this will generate CSRT process lifetime reports for individual RPC workers that trigger the configured logger thresholds. This allows for quantifying per node resource usage when desired, as otherwise the reports are at the http request level and don't provide per node stats.

The key idea here is that having RPC level CSRT process lifetime reporting is incredibly useful, but can also generate large quantities of data. For example, a view query on a Q=64 database will stream results from 64 shard replicas, resulting in at least 64 RPC reports, plus any that might have been generated from RPC workers that "lost" the race for shard replica. This is very useful, but a lot of data given the verbose nature of funneling it through the RSyslog reports, however, the ability to write directly to something like ClickHouse or another columnar store would be great.

Until there's an efficient storage mechanism to stream the results to, the rsyslog entries work great and are very practical, but care must be taken to not generate too much data for aggregate queries as they generate at least Qx more report than an individual report per http request from the coordinator. This setting exists as a way to either a) utilize the logger matcher configured thresholds to allow for any rctx's to be recorded when they induce heavy operations, either Coordinator or RPC worker; or b) to only log workloads at the coordinator level.

NOTE: this setting exists because we lack an expressive enough config declaration to easily chain the matchspec constructions as ets:fun2ms/1 is a special compile time parse transform macro that requires the fully definition to be specified directly, it cannot be iteractively constructed. That said, you can register matchers through remsh with more specific and fine grained pattern matching, and a more expressive system for defining matchers are being explored.

config:get_boolean(?CSRT, "should_truncate_reports", true)

Enables truncation of the CSRT process lifetime reports to not include any fields that are zero at the end of process lifetime, eg don't include js_filter=0 in the report if the request did not induce Javascript filtering.

This can be disabled if you really care about consistent fields in the report logs, but this is a log space saving mechanism, similar to disabling RPC reporting by default, as its a simple way to reduce overall volume

config:get(?CSRT, "randomize_testing", true).

This is a make eunit only feature toggle that will induce randomness into the cluster's csrt:is_enabled() state, specifically to utilize the test suite to exercise edge case scenarios and failures when CSRT is only conditionally enabled, ensuring that it gracefuly and robustly handles errors without fallout to the underlying http clients.

The idea here is to introduce randomness into whether CSRT is enabled across all the nodes to simulate clusters with heterogeneous CSRT enablement and also to ensure that CSRT works properly when toggled on/off wihout causing any unexpected fallout to the client requests.

This is a config toggle specifically so that the actual CSRT tests can disable it for making accurate assertions about resource usage traacking, and is not intended to be used directly.

config:get_integer(?CSRT, "query_limit", ?QUERY_LIMIT)

Limit the quantity of rows that can be loaded in an http query.

CSRT_INIT_P (?CSRT_INIT_P="csrt.init_p") Config Settings

config:get(?CSRT_INIT_P, ModFunName, false).

These config toggles exist to conditionaly enable additional tracking of RPC endpoints of interest, but rather it's a way to selectively enable tracking for a subset of RPC operations, in a way we can extend later to add more. The ModFunName is of the form atom_to_list(Mod) ++ "__" atom_to_list(Fun), eg "fabric_rpc__open_doc", and right now, only exists for fabric_rpc modules.

NOTE: this is a bit awkward and isn't meant to be used directly, instead, utilize config:set(?CSRT, "enable_init_p", "true"). to enable or disable these as a whole.

The current set of operations, as copied in from default.ini

[csrt.init_p]
fabric_rpc__all_docs = true
fabric_rpc__changes = true
fabric_rpc__get_all_security = true
fabric_rpc__map_view = true
fabric_rpc__open_doc = true
fabric_rpc__open_shard = true
fabric_rpc__reduce_view = true
fabric_rpc__update_docs = true

CSRT Logger Matcher Enablement and Thresholds

There are currently six builtin default loggers designed to make it easy to do filtering on heavy resource usage inducing and long running requests. These are designed as a simple baseline of useful matchers, declared in a manner amenable to default.ini based constructs. More expressive matcher declarations are being explored, and matchers of arbitrary complexity can be registered directly through remsh. The default matchers are all designed around an integer config Threshold that triggers on a specific field, eg docs read, or on a delta of fields for long requests and changes requests that process many rows but return few.

The current default matchers are:

  • docs_read: match all requests reading more than N docs
  • rows_read: match all requests reading more than N rows
  • docs_written: match all requests writing more than N docs
  • long_reqs: match all requests lasting more than N milliseconds
  • changes_processed: match all changes requests that returned at least N rows less than was necessarily loaded to complete the request (eg find heavy filtered changes requests reading many rows but returning few).
  • ioq_calls: match all requests inducing more than N ioq_calls

Each of the default matchers has an enablement setting in config:get(?CSRT_MATCHERS_ENABLED, Name) for toggling enablement of it, and a corresponding threshold value setting in config:get(?CSRT_MATCHERS_THRESHOLD, Name) that is an integer value corresponding to the specific nature of that matcher.

CSRT Logger Matcher Enablement (?CSRT_MATCHERS_ENABLED)

-define(CSRT_MATCHERS_THRESHOLD, "csrt_logger.matchers_enabled").

config:get_boolean(?CSRT_MATCHERS_ENABLED, "docs_read", false)

Enable the docs_read builtin matcher, with a default Threshold=1000, such that any request that reads more than Threshold docs will generate a CSRT process lifetime report with a summary of its resouce consumption.

This is different from the rows_read filter in that a view with ?limit=1000 will read 1000 rows, but the same request with ?include_docs=true will also induce an additional 1000 docs read.

config:get_boolean(?CSRT_MATCHERS_ENABLED, "rows_read", false)

Enable the rows_read builtin matcher, with a default Threshold=1000, such that any request that reads more than Threshold rows will generate a CSRT process lifetime report with a summary of its resouce consumption.

This is different from the docs_read filter so that we can distinguish between heavy view requests with lots of rows or heavy requests with lots of docs.

config:get_boolean(?CSRT_MATCHERS_ENABLED, "docs_written", false)

Enable the docs_written builtin matcher, with a default Threshold=500, such that any request that writtens more than Threshold docs will generate a CSRT process lifetime report with a summary of its resouce consumption.

config:get_boolean(?CSRT_MATCHERS_ENABLED, "ioq_calls", false)

Enable the ioq_calls builtin matcher, with a default Threshold=10000, such that any request that induces more than Threshold IOQ calls will generate a CSRT process lifetime report with a summary of its resouce consumption.

config:get_boolean(?CSRT_MATCHERS_ENABLED, "long_reqs", false)

Enable the long_reqs builtin matcher, with a default Threshold=60000, such that any request where the the last CSRT rctx updated_at timestamp is at least Threshold milliseconds grather than the started_at timestamp will generate a CSRT process lifetime report with a summary of its resource consumption.

CSRT Logger Matcher Threshold (?CSRT_MATCHERS_THRESHOLD)

-define(CSRT_MATCHERS_THRESHOLD, "csrt_logger.matchers_threshold").

config:get_integer(?CSRT_MATCHERS_THRESHOLD, "docs_read", 1000)

Threshold for docs_read logger matcher, defaults to 1000 docs read.

config:get_integer(?CSRT_MATCHERS_THRESHOLD, "rows_read", 1000)

Threshold for rows_read logger matcher, defaults to 1000 rows read.

config:get_integer(?CSRT_MATCHERS_THRESHOLD, "docs_written", 500)

Threshold for docs_written logger matcher, defaults to 500 docs written.

config:get_integer(?CSRT_MATCHERS_THRESHOLD, "ioq_calls", 10000)

Threshold for ioq_calls logger matcher, defaults to 10000 IOQ calls made.

config:get_integer(?CSRT_MATCHERS_THRESHOLD, "long_reqs", 60000)

Threshold for long_reqs logger matcher, defaults to 60000 milliseconds.

Core CSRT API

The csrt(.erl) module is the primary entry point into CSRT, containing API functionality for tracking the lifecycle of processes, inducing metric tracking over that lifecycle, and also a variety of functions for aggregate querying.

It's worth noting that the CSRT context tracking functions are specifically designed to not throw and be safe in the event of unexpected CSRT failures or edge cases. The aggregate query API has some callers that will actually throw, but aside from this core CSRT operations will not bubble up exceptions, and will either return the error value, or catch the error and move on rather than chaining further errors.

PidRef API

These are functions are CRUD operations around creating and storing the CSRT PidRef handle.

-export([
    destroy_pid_ref/0,
    destroy_pid_ref/1,
    create_pid_ref/0,
    get_pid_ref/0,
    get_pid_ref/1,
    set_pid_ref/1
]).

Context Lifecycle API

These are the CRUD functions for handling a CSRT context lifecycle, where a lifecycle context is created in a chttpd coordinator process by way of csrt:create_coordinator_context/2, or in rexi_server:init_p by way of csrt:create_worker_context/3. Additional functions are exposed for setting context specific info like username/dbname/handler. get_resource fetches the context being tracked corresponding to the given PidRef.

-export([
    create_context/2,
    create_coordinator_context/2,
    create_worker_context/3,
    destroy_context/0,
    destroy_context/1,
    get_resource/0,
    get_resource/1,
    set_context_dbname/1,
    set_context_dbname/2,
    set_context_handler_fun/1,
    set_context_handler_fun/2,
    set_context_username/1,
    set_context_username/2
]).

Public API

The "Public" or miscellaneous API for lack of a better name. These are various functions exposed for wider use and/or testing purposes.

-export([
    clear_pdict_markers/0,
    do_report/2,
    is_enabled/0,
    is_enabled_init_p/0,
    maybe_report/2,
    to_json/1
]).

Stats Collection API

This is the stats collection API utilized by way of couch_stats:increment_counter to do local process tracking, and also in rexi to adding and extracting delta contexts and then accumulating those values.

NOTE: make_delta/0 is a "destructive" operation that will induce a new delta by way of the last local pdict's rctx delta snapshot, and then update to the most recent version. Two individual rctx snapshots for a PidRef can safely generate an actual delta by way of csrt_util:rctx_delta/2.

-export([
    accumulate_delta/1,
    add_delta/2,
    docs_written/1,
    extract_delta/1,
    get_delta/0,
    inc/1,
    inc/2,
    ioq_called/0,
    js_filtered/1,
    make_delta/0,
    rctx_delta/2,
    maybe_add_delta/1,
    maybe_add_delta/2,
    maybe_inc/2,
    should_track_init_p/1
]).

TODO: RPC/QUERY DOCS

%% RPC API
-export([
    rpc/2,
    call/1
]).

%% Aggregate Query API
-export([
    active/0,
    active/1,
    active_coordinators/0,
    active_coordinators/1,
    active_workers/0,
    active_workers/1,
    count_by/1,
    find_by_nonce/1,
    find_by_pid/1,
    find_by_pidref/1,
    find_workers_by_pidref/1,
    group_by/2,
    group_by/3,
    query/1,
    query/2,
    query_matcher/1,
    query_matcher/2,
    sorted/1,
    sorted_by/1,
    sorted_by/2,
    sorted_by/3
]).

Recon API Ports of https://github.com/ferd/recon/releases/tag/2.5.6

This is a "port" of recon:proc_window to csrt:proc_window, allowing for proc_window style aggregations/sorting/filtering but with the stats fields collected by CSRT! This is also a direct port of recon:proc_window in that it utilizes the same underlying logic and effecient internal data structures as recon:proc_window, but rather only changes the Sample function:

%% This is a recon:proc_window/3 [1] port with the same core logic but
%% recon_lib:proc_attrs/1 replaced with pid_ref_attrs/1, and returning on
%% pid_ref() rather than pid().
%% [1] https://github.com/ferd/recon/blob/c2a76855be3a226a3148c0dfc21ce000b6186ef8/src/recon.erl#L268-L300
-spec proc_window(AttrName, Num, Time) -> term() | throw(any()) when
    AttrName :: rctx_field(), Num :: non_neg_integer(), Time :: pos_integer().
proc_window(AttrName, Num, Time) ->
    Sample = fun() -> pid_ref_attrs(AttrName) end,
    {First, Last} = recon_lib:sample(Time, Sample),
    recon_lib:sublist_top_n_attrs(recon_lib:sliding_window(First, Last), Num).

In particular, our change is Sample = fun() -> pid_ref_attrs(AttrName) end,, and in fact, if recon upstream parameterized the option of AttrName or SampleFunction, this could be reimplemented as:

%% csrt:proc_window
proc_window(AttrName, Num, Time) ->
    Sample = fun() -> pid_ref_attrs(AttrName) end,
    recon:proc_window(Sample, Num, Time).

This implementation is being highlighted here because recon:proc_window/3 is battle hardened and recon_lib:sliding_window uses an effecient internal data structure for storing the two samples that has been proven to work in production systems with millions of active processes, so swapping the Sample function with a CSRT version allows us to utilize the production grade recon functionality, but extended out to the particular CouchDB statistics we're esepecially interested in.

And on a fun note: any further stats tracking fields added to CSRT tracking will automatically work with this too.

-export([
    pid_ref_attrs/1,
    pid_ref_matchspec/1,
    proc_window/3
]).

Core types and Maybe types

Before we look at the #rctx{} record fields, lets examine the core datatypes defined by CSRT for use in Dialyzer typespecs. There are more, but these are the essentials and demonstrate the "maybe" typespec approach utilized in CSRT.

Let's say we have a -type foo() :: #foo{} and -type maybe_foo() :: foo() | undefined, we then can construct functions of the form -spec get_foo(id()) -> maybe_foo() and then we can use Dialyzer to statically assert all callers of get_foo/1 handle the maybe_foo() data type rather than just foo() and ensure that all subsequent callers do as well.

This approach of -spec maybe_<Type> :: <Type> | undefined is utilized throughout CSRT and has greatly aided in the development, refactoring, and static analysis of this system. Here's a useful snippet for running Dialyzer while hacking on CSRT:

make && time make dialyze apps=couch_stats

-type pid_ref() :: {pid(), reference()}.
-type maybe_pid_ref() :: pid_ref() | undefined.

-type coordinator_rctx() :: #rctx{type :: coordinator()}.
-type rpc_worker_rctx() :: #rctx{type :: rpc_worker()}.
-type rctx() :: #rctx{} | coordinator_rctx() | rpc_worker_rctx().
-type rctxs() :: [#rctx{}] | [].
-type maybe_rctx() :: rctx() | undefined.

Above we have the core pid_ref() data type, which is just a tuple with a pid() and a reference(), and naturally, maybe_pid_ref() handles the optional presence of a pid_ref(), allowing for our APIs like csrt:get_resource(maybe_pidref()) to handle ambiguity of the presence of a pid_ref().

We define our core rctx() data type as an empty #rctx{}, or the more specific coordinator_rctx() or rpc_worker_rctx() such that we can be specific about the rctx() type in functions that need to distinguish. And then as expected, we have the notion of maybe_rctx().

#rctx{}

This is the core data structure utilized to track a CSRT context for a coordinator or rpc_worker process, represented by the #rctx{} record, and stored in the ?CSRT_ETS table keyed on {keypos, #rctx.pid_ref}.

The Metadata fields store labeling data for the given process being tracked, such as started_at and updated_at timings, the primary pid_ref id key, the type of the process context, and some additional information like username, dbname, and the nonce of the coordinator request.

The Stats Counters fields are non_neg_integer() monotonically increasing counters corresponding to the couch_stats metrics counters we're interested in tracking at a process level cardinality. The use of these purely integer counter fields represented by a record represented in an ets table is the cornerstone of CSRT and why its able to operate at high throughput and high concurrency, as ets:update_counter/{3,4} take increment operations to be performed atomically and in isolation, in a manner in which does not require fetching and loading the data directly. We then take care to batch the accumulation of delta updates into a single update_counter call and even sneak in the updated_at tracking as a integer counter update without inducing an extra ets call.

NOTE: the typespec's for these fields include '_' atoms as possible types as that is the matchspec wildcard any of the fields can be set to when using an existing #rctx{} record to search with.

-record(rctx, {
    %% Metadata
    started_at = csrt_util:tnow() :: integer() | '_',
    %% NOTE: updated_at must be after started_at to preserve time congruity
    updated_at = csrt_util:tnow() :: integer() | '_',
    pid_ref :: maybe_pid_ref() | {'_', '_'} | '_',
    nonce :: nonce() | undefined | '_',
    type :: rctx_type() | undefined | '_',
    dbname :: dbname() | undefined | '_',
    username :: username() | undefined | '_',

    %% Stats Counters
    db_open = 0 :: non_neg_integer() | '_',
    docs_read = 0 :: non_neg_integer() | '_',
    docs_written = 0 :: non_neg_integer() | '_',
    rows_read = 0 :: non_neg_integer() | '_',
    changes_returned = 0 :: non_neg_integer() | '_',
    ioq_calls = 0 :: non_neg_integer() | '_',
    js_filter = 0 :: non_neg_integer() | '_',
    js_filtered_docs = 0 :: non_neg_integer() | '_',
    get_kv_node = 0 :: non_neg_integer() | '_',
    get_kp_node = 0 :: non_neg_integer() | '_'
    %% "Example to extend CSRT"
    %%write_kv_node = 0 :: non_neg_integer() | '_',
    %%write_kp_node = 0 :: non_neg_integer() | '_'
}).

Metadata

We use csrt_util:tnow() for time tracking, which is a native format erlang:monotonic_time() integer, which, noteably, can be and is often a negative value. You must either take a delta or convert the time to get into a useable format, as one might suspect by the use of native.

We make use of erlang:mononotic_time/0 as per the recommendation in https://www.erlang.org/doc/apps/erts/time_correction.html#how-to-work-with-the-new-api for the suggested way to Measure Elasped Time, as quoted:

Take time stamps with erlang:monotonic_time/0 and calculate the time difference
using ordinary subtraction. The result is in native time unit. If you want to
convert the result to another time unit, you can use erlang:convert_time_unit/3.

An easier way to do this is to use erlang:monotonic_time/1 with the desired time
unit. However, you can then lose accuracy and precision.

So our csrt_util:tnow/0 is implemented as the following, and we store timestamps in native format as long as possible to avoid precision loss at higher units of time, eg 300 microseconds is zero milliseconds.

-spec tnow() -> integer().
tnow() ->
    erlang:monotonic_time().

We store timestamps in the node's local erlang representation of time, specifically to be able to effeciently do time deltas, and then we track time deltas from the local node's perspective to not send timestamps across the wire. We then utilize calendar:system_time_to_rfc3339 to convert the local node's native time representation to its corresponding time format when we generate the process life cycle reports or send an http response.

NOTE: because we do an inline definition and assignment of the #rctx.started_at and #rctx.updated_at fields to csrt_util:tnow(), we must declare #rctx.updated_at after #rctx.started_at to avoid fundamental time incongruenties.

#rctx.started_at = csrt_util:tnow() :: integer() | '_',

A static value corresponding to the local node's Erlang monotonic_time at which this context was created.

#rctx.updated_at = csrt_util:tnow() :: integer() | '_',

A dynamic value corresponding to the local node's Erlang monotonic_time at which this context was updated. Note: unlike #rctx.started_at, this value will update over time, and in the process lifecycle reports the #rctx.updated_at value corresponds to the point at which the context was destroyed, allowing for calculation of the total duration of the request/context.

#rctx.pid_ref :: maybe_pid_ref() | {'', ''} | '_',

The primary identifier used to track the resources consumed by a given pid() for a specific context identified with a make_ref(), and combined together as unit as a given pid(), eg the chttpd worker pool, can have many contexts over time.

#rctx.nonce :: nonce() | undefined | '_',

The Nonce value of the http request being serviced by the coordinator_rctx() used as the primary grouping identifier of workers across the cluster, as the Nonce is funneled through rexi_server.

#rctx.type :: rctx_type() | undefined | '_',

A subtype classifier for the #rctx{} contexts, right now only supporting #rpc_worker{} and #coordinator{}, but CSRT was designed to accomodate additional context types like #view_indexer{}, #search_indexer{}, #replicator{}, #compactor{}, #etc{}.

#rctx.dbname :: dbname() | undefined | '_',

The database name, filled in at some point after the initial context creation by way of csrt:set_context_dbname/{1,2}.

#rctx.username :: username() | undefined | '_',

The requester's username, filled in at some point after the initial context creation by way of csrt:set_context_username/{1,2}.

Stats Counters

All of these stats counters are stricly non_neg_integer() counter values that are monotonically increasing, as we only induce positive counter increment calls in CSRT. Not all of these values will be nonzero, eg if the context doesn't induce Javascript filtering of documents, it won't inc the #rctx.js_filter field. The "should_truncate_reports" config value described in this document will conditionally exclude the zero valued fields from being included in the process life cycle report.

#rctx.db_open = 0 :: non_neg_integer() | '_',

Tracking `couch_stats:increment_counter([couchdb, couch_server, open])

The number of couch_server:open/2 invocations induced by this context.

#rctx.docs_read = 0 :: non_neg_integer() | '_',

Tracking `couch_stats:increment_counter([couchdb, database_reads])

The number of couch_db:open_doc/3 invocations induced by this context.

#rctx.docs_written = 0 :: non_neg_integer() | '_',

A phony metric counting docs written by the context, induced by csrt:docs_written(length(Docs0)), in fabric_rpc:update_docs/3 as a way to count the magnitude of docs written, as the actual document writes happen in the #db.main_pid couch_db_updater pid and subprocess tracking is not yet supported in CSRT.

This can be replaced with direct counting once passthrough contexts work.

#rctx.rows_read = 0 :: non_neg_integer() | '_',

Tracking couch_stats:increment_counter([fabric_rpc, changes, processed]) also Tracking couch_stats:increment_counter([fabric_rpc, view, rows_read])

A value tracking multiple possible metrics corresponding to rows streamed in aggregate operations. This is used for view_rows/changes_rows/all_docs/etc.

#rctx.changes_returned = 0 :: non_neg_integer() | '_',

The number of fabric_rpc:changes_row/2 invocations induced by this context, specifically tracking the number of changes rows streamed back to the client requeest, allowing for distinguishing between the number of changes processed to fulfill a request versus the number actually returned in the http response.

#rctx.ioq_calls = 0 :: non_neg_integer() | '_',

A phony metric counting invocations of ioq:call/3 induced by this context. As with #rctx.docs_written, we need a proxy metric to reperesent these calls until CSRT context passing is supported so that the ioq_server pid and return its own delta back to the worker pid.

#rctx.js_filter = 0 :: non_neg_integer() | '_',

A phony metric counting the number of couch_query_servers:filter_docs_int/5 (eg ddoc_prompt) invocations induced by this context. This is called by way of csrt:js_filtered(length(JsonDocs)) which both increments js_filter by 1, and js_filtered_docs by the length of the docs so we can track magnitude of docs and doc revs being filtered.

#rctx.js_filtered_docs = 0 :: non_neg_integer() | '_',

A phony metric counting the quantity of documents filtered by way of couch_query_servers:filter_docs_int/5 (eg ddoc_prompt) invocations induced by this context. This is called by way of csrt:js_filtered(length(JsonDocs)) which both increments #rctx.js_filter by 1, and #rctx.js_filtered_docs by the length of the docs so we can track magnitude of docs and doc revs being filtered.

#rctx.get_kv_node = 0 :: non_neg_integer() | '_',

This metric tracks the number of invocations to couch_btree:get_node/2 in which the NodeType returned by couch_file:pread_term/2 is kv_node, instead of kp_node.

This provides a mechanism to quantify the impact of document count and document size as those values become larger in the logarithmic complexity btree algorithms. size on the logarithmic complexity btree algorithms as the database btrees grow.

#rctx.get_kp_node = 0 :: non_neg_integer() | '_'

This metric tracks the number of invocations to couch_btree:get_node/2 in which the NodeType returned by couch_file:pread_term/2 is kp_node, instead of kv_node.

This provides a mechanism to quantify the impact of document count and document size as those values become larger in the logarithmic complexity btree algorithms. size on the logarithmic complexity btree algorithms as the database btrees grow.

%% "Example to extend CSRT" %%write_kv_node = 0 :: non_neg_integer() | '', %%write_kp_node = 0 :: non_neg_integer() | ''

chewbranca avatar Jul 18 '25 18:07 chewbranca

Thanks for the feedback @iilyak and @nickva, I'll respond directly to the various other comments, and some higher level responses here:

Thanks for writting the CSRT.md! Lots of details there but consider adding something that a CouchDB upstream user might look at in the docs. Focusing on what problem this feature might solve for them, how they could use the HTTP API and the reports to solve. I think that sort of got lost in the implementation details (ets table, define macros etc). Something high level like "You may have noticed your nodes are throwing timeout error or there is excessive resource usage. Here is how to use this csrt feature to find out what's inducing that".

Yeah most of the CSRT.md docs were written while Ilya was updating the HTTP API and Query API, so I focused on documenting the internals. In debe52fab I've updated the docs with direct examples and sample output, as well as linking out the Query and HTTP API documentation from Ilya.

I see there is an HTTP API interface and a report logger. Is one introspection method the primary? We should probably have the HTTP docs in and some docs blurb of how to use the new feature.

To be specific, there's a real time query API that allows for filtered querying and aggregations of the live running system, by way of the csrt_query.erl module directly or through the HTTP API, and then there's a process lifecycle report that logs a report at the end of the process's lifetime containing details about the accumulated request totals.

So basically we track the data in real time in a manner that allows for live querying and aggregations, but also for simple report logging of heavy usage requests, with considerable granularity in the desired reporting.

For configuration section in default.ini make sure they match the description in the docs (off by default), use commented out values. Maybe add some instruction in the comments of when it would make sense enabling one feature or disabling. We have config docs section those config items should probably go in there.

Yes, the default.ini file has all the correct default settings in it now, along with some additional settings to keep this fully enabled for purposes of CI testing. I have the following local diff ready to disable CSRT by default, at which point we won't be running the CI through the full codepaths, hence waiting until the last end to disable it by default. Here's the current patch with the config toggls and a few test updates to facilitate the changes:

diff --git a/src/couch_stats/src/csrt_logger.erl b/src/couch_stats/src/csrt_logger.erl
index 95bf864a7..a3308f8f1 100644
--- a/src/couch_stats/src/csrt_logger.erl
+++ b/src/couch_stats/src/csrt_logger.erl
@@ -560,9 +560,7 @@ initialize_matchers(RegisteredMatchers) when is_map(RegisteredMatchers) ->
 
 -spec matcher_enabled(Name :: string()) -> boolean().
 matcher_enabled(Name) when is_list(Name) ->
-    %% TODO: fix
-    %% config:get_boolean(?CSRT_MATCHERS_ENABLED, Name, false).
-    config:get_boolean(?CSRT_MATCHERS_ENABLED, Name, true).
+    config:get_boolean(?CSRT_MATCHERS_ENABLED, Name, false).
 
 -spec matcher_threshold(Name, Threshold) -> string() | integer() when
     Name :: string(), Threshold :: pos_integer() | string().
diff --git a/src/couch_stats/src/csrt_util.erl b/src/couch_stats/src/csrt_util.erl
index 2b8a20791..31c447f1e 100644
--- a/src/couch_stats/src/csrt_util.erl
+++ b/src/couch_stats/src/csrt_util.erl
@@ -64,14 +64,12 @@ is_enabled() ->
 -else.
 -spec is_enabled() -> boolean().
 is_enabled() ->
-    %% TODO: toggle back to false before merging
-    config:get_boolean(?CSRT, "enable", true).
+    config:get_boolean(?CSRT, "enable", false).
 -endif.
 
 -spec is_enabled_init_p() -> boolean().
 is_enabled_init_p() ->
-    %% TODO: toggle back to false before merging
-    config:get_boolean(?CSRT, "enable_init_p", true).
+    config:get_boolean(?CSRT, "enable_init_p", false).
 
 -spec should_track_init_p(Mod :: atom(), Func :: atom()) -> boolean().
 should_track_init_p(fabric_rpc, Func) ->
@@ -82,8 +80,7 @@ should_track_init_p(_Mod, _Func) ->
 %% Toggle to disable all reporting
 -spec is_enabled_reporting() -> boolean().
 is_enabled_reporting() ->
-    %% TODO: toggle back to false before merging
-    config:get_boolean(?CSRT, "enable_reporting", true).
+    config:get_boolean(?CSRT, "enable_reporting", false).
 
 %% Toggle to disable all reporting from #rpc_worker{} types, eg only log
 %% #coordinator{} types. This is a bit of a kludge that would be better served
diff --git a/src/couch_stats/test/eunit/csrt_httpd_tests.erl b/src/couch_stats/test/eunit/csrt_httpd_tests.erl
index a9cd5fbc8..d9b87846c 100644
--- a/src/couch_stats/test/eunit/csrt_httpd_tests.erl
+++ b/src/couch_stats/test/eunit/csrt_httpd_tests.erl
@@ -58,6 +58,7 @@ setup_ctx() ->
 
 setup() ->
     {Ctx, Url} = setup_ctx(),
+    csrt_test_helper:enable_default_logger_matchers(),
     Rctxs = [
         rctx(#{dbname => <<"db1">>, ioq_calls => 123, username => <<"user_foo">>}),
         rctx(#{dbname => <<"db1">>, ioq_calls => 321, username => <<"user_foo">>}),
diff --git a/src/couch_stats/test/eunit/csrt_logger_tests.erl b/src/couch_stats/test/eunit/csrt_logger_tests.erl
index 2c408d607..53c88346c 100644
--- a/src/couch_stats/test/eunit/csrt_logger_tests.erl
+++ b/src/couch_stats/test/eunit/csrt_logger_tests.erl
@@ -15,10 +15,9 @@
 -import(
     csrt_test_helper,
     [
-        rctx_gen/0,
+        enable_default_logger_matchers/0,
         rctx_gen/1,
         rctxs/0,
-        rctxs/1,
         jrctx/1
     ]
 ).
@@ -82,6 +81,7 @@ make_docs(Count) ->
 
 setup() ->
     Ctx = test_util:start_couch([fabric, couch_stats]),
+    enable_default_logger_matchers(),
     config:set_boolean(?CSRT, "randomize_testing", false, false),
     config:set_boolean(?CSRT, "enable_reporting", true, false),
     config:set_boolean(?CSRT, "enable_rpc_reporting", true, false),
diff --git a/src/couch_stats/test/eunit/csrt_test_helper.erl b/src/couch_stats/test/eunit/csrt_test_helper.erl
index d919122c4..2682fa1c3 100644
--- a/src/couch_stats/test/eunit/csrt_test_helper.erl
+++ b/src/couch_stats/test/eunit/csrt_test_helper.erl
@@ -13,6 +13,7 @@
 -module(csrt_test_helper).
 
 -export([
+    enable_default_logger_matchers/0,
     rctx_gen/0,
     rctx_gen/1,
     rctxs/0,
@@ -116,3 +117,16 @@ one_of(L) ->
                 N
         end
     end.
+
+enable_default_logger_matchers() ->
+    DefaultMatchers = [
+        docs_read,
+        rows_read,
+        docs_written,
+        long_reqs,
+        changes_processed,
+        ioq_calls
+    ],
+    lists:foreach(fun(Name) ->
+        config:set(?CSRT_MATCHERS_ENABLED, atom_to_list(Name), "true", false)
+    end, DefaultMatchers).

It seems we can track per/coordinator and per/worker stats, do we emit them as the requests are running or when they finish? After the requests finish do we keep the stats around to inspect some aggregate via the HTTP API or that mostly done by some log / report aggregator, so to use this feature implies having a log report aggregator to parse the report and build graphs from there. If we do keep them around do they get cleaned up after some time?

We create a context at the point where want to start tracking, currently that's in chttpd:handle_request_int:/1 and rexi_sever:init_p, but the intention of CSRT is for further contexts to be created for things like dreyfus_rpc, and background tasks like indexing/compaction/replication/etc.

When a process creates a context, we spawn a dedicated tracker process to monitor the caller such that when the process exit's or closes out the context, the tracker process fetches the compiled CSRT logger ets match specs and runs them against the final accumulated #rctx{} stats, and if there's a match, we generate a report.

Earlier versions of CSRT tried to address the short lived process issue to aid in visibility, but it became awkward to handle and query around whether the process was alive or not, for instance, that suddenly means you can't just sum on ioq_calls to see the total of active processes, as now that total is dependent on the width of the extended window opened up to see the processes. I agree that it'd be nice to see be able to those for longer, but there's some nuances that made resulted in me dropping it from the initial version of CSRT. I think it's worth following up on, perhaps an extra ets table to store the final results for an extended period of time, could do something like only those that trigger a logger end up there. Pros and cons, but I chose to initially prioritize accuracy of the live table as it drastically cleaned up the earlier hacks around alive vs dead.

CSRT application is about the size of the couch_stats application itself but its own supervisor, hrl files and separate csrt_utils module. Besides a single line callback in increment_counter (maybe_track_local_counter/2) they don't interact much so it should probably be its own separate application. Then it can use a more conventional name of couch_resource_tracker or couch_rt to keep it shorter.

You mentioned this in the other PR, but I don't agree, and I still think this makes sense to be contained within couch_stats itself, as it was originally modeled after the existing couch_stats_process_tracker and it's hooking into the existing stats collection. I think it's premature to move in isolation to its own app until its more clear what the desired extension points are to the rest of the codebase. Right now we only track positive counter increments, but hey, tracking histograms could also be of value too. It's not clear to me there's immediate benefit to separation from either an application architecture standpoint or a code readability viewpoint. My preference is keep it as it is for now, and then if there are additional plans for extension into background jobs or further nested contexts like into couch_file, then we'll have a clearer picture of what application abstractions make sense.

As for using a dedicated supervisor, as I elaborated on in the prior PR at https://github.com/apache/couchdb/pull/5491#discussion_r2181359146 I believe the use of a dedicated transient supervisor should the the standard operating procedure for all new experimental gen_server's introduced into the codebase. The use of a dedicated supervisor here is for isolation purposes that allow for progressively deploying new functionality in a safe manner, and I hope to see this pattern be a common workflow in the future, with transient supervisors enabled initially and eventually promoted into permanent supervisors.

chewbranca avatar Jul 22 '25 02:07 chewbranca

Thanks @iilyak and @nickva, I've addressed most of the comments in 2357812d4 as well as CSRT.md updates in debe52fab addressing Ilya's comments and also adding an updated overview, list of examples, and links to more info.

chewbranca avatar Jul 22 '25 05:07 chewbranca

Alright I went ahead and reworked the should_track_init_p logic in d757adda0 to no longer use the dynamic config lookups and instead use direct function head definitions on the desired metrics. This deletes the [csrt.init_p] default.ini section completely and entirely This has the added benefit of removing the fields from the default.ini section using the __ separator, avoiding that issue entirely, although on that front, it'd be worthwhile to formally declare what types of values are acceptable in the keys, as I've been rather hesitant to go wild on that front.

This also removed the ugly string conversion in csrt_util:fabric_conf_key, and as much emphasis I've made in this PR on avoiding dynamic runtime string conversions for expected entities, I'm very much glad to delete the ugliness of that string conversion! 😅

chewbranca avatar Jul 22 '25 06:07 chewbranca

I talked with @nickva out of band and realized I confused myself on the need for using absolute values around the time deltas for negative timestamps, negative negative numbers confused me, but he's correct that the use of abs there is extraneous and can be removed. I'll take care of that and he also convinced me to extract out CSRT from couch_stats into a dedicated application, so I'll sort that out now as well.

chewbranca avatar Jul 22 '25 19:07 chewbranca

Okay over in 38608f7cd I've dropped the extraneous abs usage, cleaned up the gen_server callbacks, dropped the unlink call, and cleaned up some naming discrepancies, plus I've added additional documentation clarifying that the default logger matchers are for matching against any requests in CouchDB as opposed to the "dbnames_io" matchers being targeted to a specific database name, and mentioned the tradeoff of losing some of the granularity of matching against particular dimensions. Alternatively, we could add dbnames_ioq matcher for creating a matcher on a dbname with a particular IOQ threshold, but because we can't easily chain these definition's we would need to make dedicated matchers for all of the desired combinations, eg dbnames_docs_read, dbnames_rows_read, etc. The combinatorics become even more problematic when you want to express matchers like "match on db foo for changes requests that have induced more than 1000 IOQ calls and more than 100 Javascript filter invocations".

You can actually create that matcher now, but it needs to be registered directly by way of remsh, for example, the follow dynamically creates a CSRT logger matcher that satisfies the constraint "match on db foo for changes requests that have induced more than 1000 IOQ calls and more than 100 Javascript filter invocations".:

([email protected])16> rr(csrt_server).
[coordinator,rctx,rpc_worker,st]
([email protected])17> ets:fun2ms(fun(#rctx{dbname = <<"foo">>, type=#coordinator{mod='chttpd_db', func='handle_changes_req'}, ioq_calls=IC, js_filter=JF}=R) when IC > 1000 andalso JF > 100 -> R end).
[{#rctx{started_at = '_',updated_at = '_',pid_ref = '_',
        nonce = '_',
        type = #coordinator{mod = chttpd_db,
                            func = handle_changes_req,method = '_',path = '_'},
        dbname = <<"foo">>,username = '_',db_open = '_',
        docs_read = '_',docs_written = '_',rows_read = '_',
        changes_returned = '_',ioq_calls = '$1',js_filter = '$2',
        js_filtered_docs = '_',get_kv_node = '_',get_kp_node = '_'},
  [{'andalso',{'>','$1',1000},{'>','$2',100}}],
  ['$_']}]
([email protected])18> csrt_logger:register_matcher("custom_foo", ets:fun2ms(fun(#rctx{dbname = <<"food_db', func='handle_changes_req'}, ioq_calls=IC, js_filter=JF}=R) when IC > 1000 andalso JF > 100 -> R end)).
ok

That'll dynamically compile the matchspec and push it out by way of persistent_term to be picked up by the tracker pids to decide whether or not to generate a process lifecycle report.

The tricky bit is mapping ets:fun2ms(fun(#rctx{dbname = <<"foo">>, type=#coordinator{mod='chttpd_db', func='handle_changes_req'}, ioq_calls=IC, js_filter=JF}=R) when IC > 1000 andalso JF > 100 -> R end) to something we can express in default.ini. The ets:fun2ms transform is a great tool that facilitates declaratively constructing these matchspecs that allow us to efficiently query the ets tracking table, while also re-using these same matchers directly against a given #rctx{} to filter requests to log, however, its use of parse transforms makes it difficult to iteratively and programmatically construct complex pattern match statements.

I would love to see a simple translation layer that basically lets us use Mango syntax for declaring these filters, as that would make it much easier to express within the ini files, but also it would allow us to dynamically construct the logger matcher specs on the fly for a given HTTP request, eg you could POST Mango spec but with fields in #rctx{} to then query the ets table. If we get to where we have the expressiveness of something like Mango query syntax to define the logger matchers, then we can replace most of these default matchers with better more targeted matchers while also providing an HTTP query API that can dynamically generate these ets matchspecs for efficient querying and aggregating.

chewbranca avatar Jul 22 '25 22:07 chewbranca

Over in 38a53a059 I migrated CSRT into the couch_srt application, aka CSRT for short. For context, I did so with a series of sed commands and a bit of manual cleanup. I did that incrementally and did local git commits with the different steps and sed commands as commit messages where used, which I then squashed down into a final commit before pushing out here, but here's the squashed sequence of steps to generate the above git sha:

Extract CSRT into dedicated couch_srt application
sed -I '' 's/csrt.hrl/couch_srt.hrl/g' src/*/{src,test/eunit}/*.erl

sed -I '' 's/csrt:/couch_srt:/g' src/*/{src,test/eunit}/*.erl

sed -I '' 's/csrt_\([a-z_]*\):/couch_srt_\1:/g' src/*/{src,test/eunit}/*.erl

Cleanup remaining 'csrt\(_[a-z_]*\)\?:' references

Hook in couch_srt app

sed -I '' 's/^-module(csrt/-module(couch_srt/g' src/*/{src,test/eunit}/*.erl

More cleanup

That commit creates a clean separation between couch_stats and couch_srt, and I even moved most of the couch_stats changes to within couch_srt.

One thing that I'd like feedback on in particular, I preserved the name "CSRT" as I think it's a good name for the system and I've grown accustomed to referring to it as such. The public Erlang APIs all use couch_srt, but internally within I've kept the use of csrt in a number of places. I still think CSRT is a better and more concise name than couch_srt, so I've left that in the documentation and config settings as I think the conciseness of "csrt" helps in a number of places, for instance, I'm still using "csrt" and "csrt_logger.matchers_threshold" for the config settings as I think that's cleaner and more user friendly that couch_srt_logger.matchers_threshold.

I like the balance of "couch_srt" and "csrt" in that commit so I didn't go further on renaming things in couch_srt.hrl, but let me know what y'all think.

I kept that as an isolated commit to have a logical commit containing all the direct sed changes, and then I waited to run make erlfmt-format until afterwards so folks could reasonably follow the sed commit and not be surprised by the format updates in 86ba96be0.

Tomorrow I'll add some additional updates to the overview documentation to better illustrate when/where/how users would initially enable and utilize CSRT. Aside from the additional documentation updates, I think I've addressed all outstanding issues I'm aware of, any further feedback on this PR?

chewbranca avatar Jul 23 '25 02:07 chewbranca

Alright I've gone through and addressed most things:

  • I've reworked the docs to with more details on how/where/when/why to utilize CSRT and also moved them into the Sphinx docs in: 828667e3d as well as introducing a new example or two
  • In eb92a4d5a I reworked csrt_query:group_by to use the more efficient ets:select as well as to utilize that to enforce the query_cardinality_limit() constraints which I also updated to be more clear in that commit. I also fixed a small bug in csrt_query:options/1
  • I added default Logger Matchers all_coordinators and all_rpc_workers in 00a02af01 so we'd have an easy and dedicated way to toggle on logging of all coordinate process lifecycle reports, and if so desired, the same with all_rpc_workers, but the other impact of this change is that we can now easily aggregate on only coordinators or workers when querying against the HTTP and query API
  • updated couch_log_formatter:format_meta to ignore null values so we don't end up with report entries like dbname="null" as well updated the test case in d5561f22e
  • cleaned up various test cases based on PR beedback in 1380b09fc
  • various other cleanup bits

chewbranca avatar Jul 28 '25 15:07 chewbranca

Thanks for the additional feedback Ilya, I've fixed up all the typos in 6481338c6, much appreciated.

Fwiw I'm having a lot of trouble seeing the commit specific comments, I didn't see those in the PR at all, I happened to notice an email notification. For some reason, those conversations are not showing up for me in this top level PR, or the specific PR commit https://github.com/apache/couchdb/pull/5602/commits/828667e3d724c43763ce772180b748287b8c4b8c but I finally found your comments over in the raw commit page at https://github.com/apache/couchdb/commit/828667e3d724c43763ce772180b748287b8c4b8c .

I've sorted out all of those comments aside from the recommendation to switch to Erlang code blocks, as that made Sphinx not happy, oddly it complained about the # in a #Ref<x.y.z> that output in remsh... kind of odd to not to recognize that for Erlang, although it's technically not the proper language you'd think it would handle console output from that language haha.

chewbranca avatar Jul 28 '25 19:07 chewbranca

Thanks for taking another look, @nickva!

That's odd the reports aren't showing for you, the config settings you listed look okay, and couch_srt_logger has a config:listener to reload the matchers on changes to any of the logger config groups https://github.com/apache/couchdb/blob/couch-stats-resource-tracker-v3-rebase-main/src/couch_srt/src/couch_srt_logger.erl#L588-L598 so that's odd you're not seeing the changes picked up.

The first four commands you did should bootstrap the system into generating reports, eg these settings:

([email protected])2> config:set("csrt", "enable", "true").
ok

([email protected])3> config:set("csrt", "enable_init_p", "true").
ok

([email protected])4> config:set("csrt", "enable_reporting", "true").
ok

([email protected])5> config:set("csrt_logger.matchers_enabled", "all_coordinators", "true").

ok

I just did a fresh clone and with the following diff I'm properly getting reports generated like:

[notice] 2025-09-04T19:43:09.288686Z [email protected] <0.11224.0> 381d115cad localhost:15984 127.0.0.1 adm GET /_all_dbs 200 ok 3 [report] 2025-09-04T19:43:09.288790Z [email protected] <0.11274.0> -------- [csrt-pid-usage-lifetime db_open=5 get_kv_node=2 nonce="381d115cad" pid_ref="<0.11224.0>:#Ref<0.3500684338.1004273667.225747>" rows_read=2 started_at="2025-09-04T19:43:09.292z" type="coordinator-{chttpd_misc:handle_all_dbs_req}:GET:/_all_dbs" updated_at="2025-09-04T19:43:09.295z" username="adm"]

(chewbranca)-(jobs:0)-(/tmp/couchdb)
(! 15428)-> git diff
diff --git a/rel/overlay/etc/default.ini b/rel/overlay/etc/default.ini
index b2ffa87b7..3cfe9d1aa 100644
--- a/rel/overlay/etc/default.ini
+++ b/rel/overlay/etc/default.ini
@@ -1153,9 +1153,9 @@ url = {{nouveau_url}}
 
 ; Couch Stats Resource Tracker (CSRT)
 [csrt]
-;enable = false
-;enable_init_p = false
-;enable_reporting = false
+enable = true
+enable_init_p = true
+enable_reporting = true
 ;enable_rpc_reporting = false
 
 ; Truncate reports to not include zero values for counter fields. This is a
@@ -1223,7 +1223,7 @@ url = {{nouveau_url}}
 ; CSRT default matchers - enablement configuration
 ; The default CSRT loggers can be individually enabled below
 [csrt_logger.matchers_enabled]
-;all_coordinators = false
+all_coordinators = true
 ;all_rpc_workers = false
 ;docs_read = false
 ;rows_read = false

chewbranca avatar Sep 04 '25 19:09 chewbranca

@nickva I also tried it out locally in my dev cluster but couldn't get anything to show up in the logs.

I did repeat the steps you did locally.

([email protected])2> config:set("csrt", "enable", "true").
ok

([email protected])3> config:set("csrt", "enable_init_p", "true").
ok

([email protected])4> config:set("csrt", "enable_reporting", "true").
ok

([email protected])5> config:set("csrt_logger.matchers_enabled", "all_coordinators", "true").

ok
([email protected])6> config:set("csrt_logger.matchers_enabled", "docs_read", "true").
ok

([email protected])7> config:set("csrt_logger.matchers_threshold", "docs_read", "50").
ok

([email protected])8> config:set("csrt_logger.matchers_enabled", "changes_processed", "true").
ok

([email protected])9> config:set("csrt_logger.matchers_threshold", "changes_processed", "23").

Except I didn't create new database. I queried _replicator db.

while true; do curl -u adm:pass http://127.0.0.1:15984/_replicator/_changes > /dev/null ; sleep 5 ; done

This is what I see in the logs

[notice] 2025-09-16T20:30:11.007168Z [email protected] <0.10531.0> ef994f45d5 127.0.0.1:15984 127.0.0.1 adm GET /_replicator/_changes 200 ok 1
[report] 2025-09-16T20:30:11.007228Z [email protected] <0.10580.0> -------- [csrt-pid-usage-lifetime db_open=5 dbname="_replicator" nonce="ef994f45d5" pid_ref="<0.10531.0>:#Ref<0.3096609386.3958636548.180199>" started_at="2025-09-16T20:30:11.006z" type="coordinator-{chttpd_db:handle_changes_req}:GET:/_replicator/_changes" updated_at="2025-09-16T20:30:11.006z" username="adm"]
[notice] 2025-09-16T20:30:16.056722Z [email protected] <0.10689.0> 9c9fcd161b 127.0.0.1:15984 127.0.0.1 adm GET /_replicator/_changes 200 ok 2
[report] 2025-09-16T20:30:16.056874Z [email protected] <0.10738.0> -------- [csrt-pid-usage-lifetime db_open=5 dbname="_replicator" nonce="9c9fcd161b" pid_ref="<0.10689.0>:#Ref<0.3096609386.3958636546.179197>" started_at="2025-09-16T20:30:16.054z" type="coordinator-{chttpd_db:handle_changes_req}:GET:/_replicator/_changes" updated_at="2025-09-16T20:30:16.056z" username="adm"]

iilyak avatar Sep 16 '25 20:09 iilyak