jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8325465: JFR: Context filtering

Open jbachorik opened this issue 10 months ago • 8 comments

A preliminary PoC work for JFR contextual event

(Based on initial discussion in https://github.com/skogsluft/jdk-skogsluft/discussions/4)

Contextual Events

Contextual events, as the name suggests, are to be used to provide additional context to other events. Examples might be tracing context in a distributed tracing, transaction context, work unit identification etc.

A contextual event is thread specific - it will provide context only to events committed on the same thread, after the contextual event was started (begin() method) but before it has finished (end() method).

There may be multiple contextual events active for a thread. However, they must form a stack - eg. if an event CtxA is opened before event CtxB they must be closed in reverse order, first closing CtxB and only then CtxA. In case the events are crossing each other the behaviour is not defined.

Design

Contextual annotation

A contextual event will be demarked by @Contextual annotation. This annotation wil be a simple indication that this particular event type is supposed to provide context to other events and tooling can handle it as such.

All custom fields of such annotated event type will then constitute the context.

Context driven behaviour

Although having the @Contextual annotation will allow the tooling to associate the context with other JFR events, there are more ways they can be utilized.

Conditionally emit events

The contextual events can be used to guard production of events which are too costly to emit unconditionally and using the durational thresholds would introduce too strong bias. An example would be JavaMonitorWait event.

If left unchecked, the emission rate of JavaMonitorEvent can overwhelm the recording. What's worse is that the majority of the recorded events will provide very little additional information. Turning on the durational threshold will improve the situation, but will introduce bias where the JFR will not be able to point out too much time spent waiting on a lock, if each wait is shorter than the threshold. In addition to that, this event type might be frequently emitted from thread pools where threads are just waiting for work.

If the emission is bound to the presence of a context (contextual event) which will be activated only when an important work (what is important work will usually be defined by the user) is being done, providing laser focus on fine-grained details of the application's behaviour.

Record only activated context

We are talking about an activated context (contextual event) when there is at least one other event committed on the same thread between calling begin() and end() of the contextual thread. We can also think about the context being 'triggered' by the regular events.

The concept of 'active' context is beneficial in lowering the overhead related to recording the context - eg. for the distributed tracers with context propagation it is possible to generated millions of contextual events per minute for certain frameworks (async and reactive ones are pretty notorious). This creates a huge pressure both when the recording is written and also when it needs to be processed. And most of these events will be literally useless because there would be no events the context could be applied to.

Controlling the behaviour via settings

The proposal is to use the standard JFR event settings mechanism to affect the behaviour of both contextual and regular events.

There will be a new setting called select and the following permitted values:

  • if-context - the regular event will be emitted only if a context is present
  • if-triggered - the contextual event will be emitted only if the context is triggered
  • all - no context related restrictions are applied

The if-context option is valid only for non-contextual events. The if-triggered option is valid only for contextual events. The all option is valid for any event.

If an invalid option is provided, JFR will log a warning and the setting will be set to all.

The select setting is to be used in conjunction with other filtering mechanisms, like threshold.

Implementation

@Contextual annotation

The annotation implementation is pretty straightforward and there is nothing special going on there.

Activated context

In order to support selective emission of the contextual events only when they are activated the event class must be instrumented and a synthetic field named ^ctxOffset must be inserted there.

The field is used to track the number of events written while this context is open. The actual number does not matter, we just need to make sure we can tell there is at least one written event.

This information is then used in the shouldCommit() method of the contextual event type which needs to be changed to consult ^ctxOffset field and return false if that field is 0. That is, if the event's settings contains select=if-triggered. Otherwise, the behaviour of shouldCommit() is not affected.

The ^ctxOffset field is updated from EventWriter, incrementing it on a new event commit.

Filtered events

A new SelectorSetting which will check for the presence of context before committing an event with this setting set to if-context value. The context presence is determined by the 'context count' per thread. Each time a context event calls begin() the count is incremented and then decremented on call to end().

This is available both for the built-in (native) and user defined JFR events, as long as they are not periodic. The periodic events are not really feasible to use with the selector because they are committed on a dedicated thread which should not be using any conditional contexts.


Progress

  • [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [ ] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8325465: JFR: Context filtering (Enhancement - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18689/head:pull/18689
$ git checkout pull/18689

Update a local copy of the PR:
$ git checkout pull/18689
$ git pull https://git.openjdk.org/jdk.git pull/18689/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18689

View PR using the GUI difftool:
$ git pr show -t 18689

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18689.diff

jbachorik avatar Apr 09 '24 11:04 jbachorik