azure-dev Define `azd` telemetry component

We need to set up a component that allows us to log telemetry prior to #190. This can be as simple as using appinsights-go package directly to log to AppInsights, or a more robust solution.

Requirements in descending order of importance:

Minimizing observable performance impact on CLI due to telemetry being emitted / uploaded
Reliability of telemetry (how well does telemetry respond to early terminations, crashes, telemetry service unavailability).
Ease of developer use in instrumentation
Latency of telemetry (ingestion delay)

Note that the most important telemetry we emit (the azd command usage) happens near the end of the application process lifecycle, which further emphasizes the importance of the first two requirements (performance and reliability). Any delay here impacts user perceivable performance of the tool, which is tricky to deal with, when we're experiencing ~100ms (networking + service response time) order of latency for us to upload the telemetry.

Proposal

Application emits telemetry. Telemetry is batched and upon CLI exit or batch size being met, it is stored to disk.
Upon CLI exit, a background process is invoked to perform upload of telemetry if any telemetry was emitted.
The background process (there will only be one) will upload stored telemetry on disk.
- Upon transient failures, stored telemetry is retried with delay, using disk storage as the queue.
- A cleanup thread is run separately to ensure that storage is never too backed up

Progress

[x] Add conversion between otel spans into ApplicationInsights event envelopes
[x] Add transmitter class for sending events to AppInsights
[ ] Add local logs storage + background transmitter
[ ] Create wrappers around OpenTelemetry for common telemetry usage

Jul 29 '22 18:07 weikanglim

For comparison purposes, I've started looking at how az cli telemetry and dotnet cli telemetry works.

They both have similarities:

Telemetry items are persisted to disk as they are emitted in the application, to be processed by a separate telemetry uploader.
Telemetry uploader runs in the background

The slight difference: az cli creates a new application process on shutdown (az-cli-telemetry) that processes and uploads telemetry, whereas dotnet cli creates a thread on application startup that does the same.

The difference means that for az cli, we can reliably assume that the telemetry associated with the current command is sent sometime after the current command invocation ends; whereas in dotnet cli, the telemetry is sent likely the next time a dotnet command is issued.

Jul 29 '22 18:07 weikanglim

Based on all the points above, I think long-term azd should similarly have telemetry items persisted to disk and have an uploader that runs in the background.

As far as whether the uploader should be a process or thread, I'm more in favor of az cli's model of process forking as telemetry is delivered closer to when the azd invocation ends. The only possible concern is the user perception of telemetry usage: the extra executable included with the install package, and the extra process running on the system.

Jul 29 '22 19:07 weikanglim

As a short-term solution, we could also perform synchronous upload for long running operations like azd deploy and azd provision, since the telemetry latency would be a rounding error on the overall command latency.

That being said, the cost of implementation for locally persisted telemetry + telemetry upload is small, and I am in favor of implementing it for long-term sustainability.

Jul 29 '22 19:07 weikanglim

Based on all the points above, I think long-term azd should similarly have telemetry items persisted to disk and have an uploader that runs in the background.

As far as whether the uploader should be a process or thread, I'm more in favor of az cli's model of process forking as telemetry is delivered closer to when the azd invocation ends. The only possible concern is the user perception of telemetry usage: the extra executable included with the install package, and the extra process running on the system.

@ellismg @wbreza @jongio looking for any initial thoughts on this.

Jul 29 '22 19:07 weikanglim

As a short-term solution, we could also perform synchronous upload for long running operations like azd deploy and azd provision, since the telemetry latency would be a rounding error on the overall command latency.

That being said, the cost of implementation for locally persisted telemetry + telemetry upload is small, and I am in favor of implementing it for long-term sustainability.

Strongly agree.

The only possible concern is the user perception of telemetry usage: the extra executable included with the install package, and the extra process running on the system.

I agree that launching a separate process is a good idea. I think it may be possible for us to not actually require two binaries (you could image that azd --send-telemetry is a thing that could work and would like us to that if possible. We can play tricks if needed to make sure these extra commands don't impact the customer facing UX, if needed.

I also think if we want to build an azd --update-install like thing at some point it will be way less complex if we stick to a single binary.

Once we have this "run this stuff in the background after the tool exits" component we could look at moving the "update the latest version of the CLI for our up-to-date check" into there. We took steps to minimize the impact of that logic on end to end latency but I still feel it once and a while and this would more or less solve the problem.

Jul 29 '22 19:07 ellismg

Draft of end-to-end changes: #483

Aug 18 '22 18:08 weikanglim

azure-dev azure-dev copied to clipboard

Define `azd` telemetry component

Proposal

Progress

azure-dev
azure-dev copied to clipboard