feat(v2): recording rules of function names

Open alsoba13 opened this issue 6 months ago • 0 comments

This PR approaches recording rules of functions.

There's one part I'd like to change (marked as TODO), but as it mostly involves code style, and is not compromising performance yet, I prefer to introduce this PR with a clearer presented core logic.

You may want to read this before jumping to reviewing the code, as this may help:

Observer state

Data is organized and accessed sequentially this way:

Tenant	Dataset	Series	Rows/profiles
tenant1	service_name1	fingerprint1	row1
			row2
			row3
		fingerprint2	row4
	service_name2	fingerprint3	row5
		fingerprint4	row6
		fingerprint4	row7
tenant2 ...	service_name3 ...	fingerprint5 ...	row8 ...

Observer only see single rows (last column), and need to deduce if a context switch has occurred. There are 3 types of context/states/scopes:

Tenant scope:
- A tenant switch will cause a flush of the recorded metrics, and fetch and init new recording rules for the new tenant.
Dataset scope:
- Symbols information is scoped to every dataset. During the compaction process, symbols within a dataset are gathered from different blocks and rewritten to a new block. Observe needs to detect dataset changes so symbolical information may be reset.
- Tenant's rules are filtered and narrowed to those that target the specific dataset.
- Dataset state holds lookup tables, that relate symbolical information to rules targeting those symbols.
Series scope:
- Every batch of rows with the same fingerprint may match some rules. On a series switch, every recording rule is evaluated to know which ones are relevant to the series.

Observing symbols

We will try to avoid processing symbols that don't matter to any rule. With that purpose, row is Observed before symbols. Observe will compute the state of the series so we know if the series matches some rule with a FunctionName.

Observer will completely ignore symbols when the current Series state has no FunctionName rules. In a later series, if there's one rule that matches and has a FunctionName, symbols will be observed and pointers symbol-to-rule will be computed. As we only want to process symbols once for every new symbol rewritten, we will compute lookup tables of all rules that contain FunctionNames, not only the ones matching the current Series. The reason for this is that in a later series, a new rule with FunctionName may match and maybe all the symbols were already observed, and hence missed.

As a summary, requirements/properties we find:

We want to Observe row before observing symbols.
Symbols are only observed if we need to.
The observed row has old symbols, so we need to observe again to get the rewritten ones. For this reason, I included a Flush function, that re-observes the symbol. I'd like to get rid of this, or in case we keep this 2-step observer, rename it (maybe we can have a Observe(row, state) function instead)

Jun 06 '25 13:06 alsoba13