tracetest Enable new Trigger starting from the Frontend

Currently, Tracetest only supports direct server-to-server triggers either using HTTP or GRPC, or fetching a trace using an Id. But as the Otel ecosystem keeps growing, the usage of E2E traces starting from the browsers has grown.

The idea of this ticket is to be a hub for research and implementation on how to integrate it with Tracetest.

May 15 '23 15:05 xoscar

Two things we have to think about here:

How are we gonna propagate the traceID to the frontend request? (query param or headers?)
We probably gonna need to document how users can create a new span using that traceID we propagated (maybe a small js library for that?)

May 15 '23 17:05 mathnogueira

@mathnogueira We still need to do some research, but my initial idea is to inject the context propagation headers into the frontend request. We were thinking on using Cypress. It has the intercept API that we can use to add the headers.

May 22 '23 02:05 jorgeepc

A thing to note: current implementation of @opentelemetry/instrumentation-document-load [1] only extracts the traceparent header from a <meta> element in document <head>; it won't retrieve it from the request headers.

Example of such an element:

<meta name="traceparent" content="00-5f0aca3c2016aff36fee5b20bcc7cb71-19c57ba8141a9c8a-01" />

This might change in future. Although less convenient than request headers — which can be set in a simple .proxyrc.js script for Parcel for instance —, I guess setting a <meta> element would ensure maximum «portability» of the traceparent, in short term.

[1] which is part of the @opentelemetry/auto-instrumentations-web popular meta package for web.

Jun 22 '23 21:06 olange

Hi all,

Following our conversation Initiating Tracetest from the browser initiated on Discord a few weeks ago in the #tracetest-general channel and the Deep Link feature (#2745) you newly developed, here's a summary of how we'd like to use Tracetest.

Context

We are (re-)developing a Progressive Web Application with Web Components (Lit based). The Web App displays models of future buildings in a 3D photo-realistic environment, laid on top of a 3D terrain, base cartographic and feature layers, coming from various GIS sources.

We're relying on open-source WASM & JS ESM libraries, such as Cesium.js, OpenLayers, SyncedStore, IFC.js and many more.

The solution has a few oddities:

Web Components connect to the DOM in a non-guaranteed order; the browser could instantiate and connect a child component to the DOM before its parent — although the browser tends to connect them in a «natural» order.
We have a lot of backing-services, such as blob storage for the massive 3D Tiles assets, or third-party WMS/WMTS geographic servers, that are not in our control; and almost no back-end – in other words, we have only a front-end to observe and send traces from!
In consequence, we aim to statically serve the Web App, using Edge Hosting infrastructure such as the ones offered by Vercel, Netlify and Firebase.
The lifecycle of such a Web App is complex: components collaborate with each other, asynchronously, by dispatching events; they become gradually available, as they have to fetch resources from network; and sometimes they initialize dynamically and lazily, when they depend upon large libraries (WASM) or have to fetch bigger resources.
Finally, at runtime, there is a variety of end-user hardware × software × form-factor configurations – people use the system on VDI infratructure, tablets, desktops × Safari/Firefox/Chrome in various versions; with lot of memory and CPU or little; with one or two desktop screens.

Therefore, we are instrumenting our web app manually, to build meaningful OpenTelemetry traces, which inform us on the end-user environment, and let us observe what is going on – and also, what is not going on. Automating the checking of what is going on or not with Tracetest is a (dev) game changer, for such a web app, which is much more likely to fail in production than in a «test lab» environment.

Environment

Firstly: we want to enable a trace-based testing-development workflow (or observability-driven-development workflow) in our local dev environment. Later on: we'd like to run the test suites on traces sent by apps in production, for the various configurations of our users.

Collecting traces in production is more of a stretch, as the platforms which statically serve the app do not run an OpenTelemetry Collector in their Edge servers. Vercel for instance only provides an Otel Collector for the environment of their Server Functions.

Spans

Sources of our spans:

Auto-instrumented fetch and document load;
Manually, emitted from the classes implementing our Web Components.

Our Spans minimally contain:

component attribute: name of the web component emitting a span
sequence of events: describe the stages of the component's lifecycle reached during the span; optionally contain computation results
process.{description,name,version} **attributes*3: describe the environment of the end-user (process.runtime.version contains User-Agent details)
service.{version,namespace,instance.id} attributes: describe the app version sending the traces
span.kind: internal

Optionally contain:

code.function attribute: constructor or ‹name of method› governing the span, when the function is responsible of the whole lifecycle of the span
code.namespace attribute: when applicable, ‹name of class› owning the code.function
code.args.‹argname› attributes: arguments to the code.function, when applicable
result event: value or compound value (array, set or map) that was returned – makes chatty spans, but enables to additionally unit test a critical code path.

Use cases

Prerequisites

As we do not have any back-end infrastructure, we need to generate a traceparent header in the front-end – before the auto-instrumentation is activated.

Enabling tracing and seing the spans

By default, we decided tracing would be disabled, until explicitly activated
Once activated, provide a link to a UI allowing to inspect the collected spans.

Testing the bootstrap sequence

TODO

(23.06.2023 / work-in-progress)

Jun 23 '23 13:06 olange

The difficult parts of tracing from the front-end

… having no back-end and being an OpenTelemetry n00b ⟵ I thought I would learn at doing, but I did not know that setting up tracing with OpenTelemetry was diffcult.

Path to happiness:

minimal setup of tracing with web-auto-instrumentation ⟵ this was quick, I was confident all of this was easy
seing the spans being collected — I thought at first that the Otel Collector would collect and store the traces in Postgres, and that Tracetest would enable me to inspect the collected traces, until I understood I additionally needed a datastore to store the traces, such as Jaeger, and that Jaeger would also provide me the view of its contents ⟵ thankfully @jorgeepc helped me, by pointing me at the simple stack that fits front-end tracing (Otel Collector, Jaeger, Tracetest, Postgres)
understand how to inject a hard-coded traceparent header — I first tried to inject header in the request, with a simple Parcel proxy and connect-like middleware, but it took me a while to figure out that the web-auto-instrumentation
generate a traceparent header in front-end, so to group all emitted spans under a single trace id and be able to write a test spec against it – initially I had once trace per span; then I hardcoded a trace id, but all spans entered the same trace, which grew inexorably; finally I had one trace per session, with refreshes of the web app creating a new traceparent
candidly write a first test suite, checking for timings and contents of all HTTP requests, and existence of spans! ⟵ took me a while to get there – again, thanks all at Tracetest for helping out — seing it all working confirmed my expectations; I also confirmed that Tracetest provides very actionable feedback in the test results, and that the tooling is mature (CLI & UI are on par).

Hard parts:

Being able to write a test spec requires setting up the stack; instrumenting the app; grouping all spans under a same trace id; until its done, fear of being at the wrong place/wrong time threatens to interrupt the pursuit of ODD-happiness goal.
Although I read the OpenTelemetry docs (there is a lot!) and source code (which contains many circularities), I still have not fully understood what a context is, how I can take advantage of it; namely, if and why ZoneContextManager would be appropriate for web components based web apps — the context.with(context.active(), () => { do work here }) pattern is confusing in a PWA based on web components and does not apply for execution driven by observers & flow of events; I still wonder if I should or not myself craft some span parents and handle manually the span hierachy.
The JS web instrumentation docs were my preferred (Instrumentation › Javascript › Getting started › Browser), they provided me with the most concise explanation and most consistent set of examples.

(WIP)

Jun 23 '23 17:06 olange