azure-dev
azure-dev copied to clipboard
Define `azd` telemetry component
We need to set up a component that allows us to log telemetry prior to #190. This can be as simple as using appinsights-go
package directly to log to AppInsights, or a more robust solution.
Requirements in descending order of importance:
- Minimizing observable performance impact on CLI due to telemetry being emitted / uploaded
- Reliability of telemetry (how well does telemetry respond to early terminations, crashes, telemetry service unavailability).
- Ease of developer use in instrumentation
- Latency of telemetry (ingestion delay)
Note that the most important telemetry we emit (the azd command usage) happens near the end of the application process lifecycle, which further emphasizes the importance of the first two requirements (performance and reliability). Any delay here impacts user perceivable performance of the tool, which is tricky to deal with, when we're experiencing ~100ms (networking + service response time) order of latency for us to upload the telemetry.
Proposal
- Application emits telemetry. Telemetry is batched and upon CLI exit or batch size being met, it is stored to disk.
- Upon CLI exit, a background process is invoked to perform upload of telemetry if any telemetry was emitted.
- The background process (there will only be one) will upload stored telemetry on disk.
- Upon transient failures, stored telemetry is retried with delay, using disk storage as the queue.
- A cleanup thread is run separately to ensure that storage is never too backed up
Progress
- [x] Add conversion between
otel
spans into ApplicationInsights event envelopes - [x] Add transmitter class for sending events to AppInsights
- [ ] Add local logs storage + background transmitter
- [ ] Create wrappers around OpenTelemetry for common telemetry usage
For comparison purposes, I've started looking at how az cli telemetry and dotnet cli telemetry works.
They both have similarities:
- Telemetry items are persisted to disk as they are emitted in the application, to be processed by a separate telemetry uploader.
- Telemetry uploader runs in the background
The slight difference:
az cli
creates a new application process on shutdown (az-cli-telemetry
) that processes and uploads telemetry, whereas dotnet cli
creates a thread on application startup that does the same.
The difference means that for az cli
, we can reliably assume that the telemetry associated with the current command is sent sometime after the current command invocation ends; whereas in dotnet cli
, the telemetry is sent likely the next time a dotnet command
is issued.
Based on all the points above, I think long-term azd
should similarly have telemetry items persisted to disk and have an uploader that runs in the background.
As far as whether the uploader should be a process
or thread
, I'm more in favor of az cli
's model of process forking as telemetry is delivered closer to when the azd
invocation ends. The only possible concern is the user perception of telemetry usage: the extra executable included with the install package, and the extra process running on the system.
As a short-term solution, we could also perform synchronous upload for long running operations like azd deploy
and azd provision
, since the telemetry latency would be a rounding error on the overall command latency.
That being said, the cost of implementation for locally persisted telemetry + telemetry upload is small, and I am in favor of implementing it for long-term sustainability.
Based on all the points above, I think long-term
azd
should similarly have telemetry items persisted to disk and have an uploader that runs in the background.As far as whether the uploader should be a
process
orthread
, I'm more in favor ofaz cli
's model of process forking as telemetry is delivered closer to when theazd
invocation ends. The only possible concern is the user perception of telemetry usage: the extra executable included with the install package, and the extra process running on the system.
@ellismg @wbreza @jongio looking for any initial thoughts on this.
As a short-term solution, we could also perform synchronous upload for long running operations like azd deploy and azd provision, since the telemetry latency would be a rounding error on the overall command latency.
That being said, the cost of implementation for locally persisted telemetry + telemetry upload is small, and I am in favor of implementing it for long-term sustainability.
Strongly agree.
The only possible concern is the user perception of telemetry usage: the extra executable included with the install package, and the extra process running on the system.
I agree that launching a separate process is a good idea. I think it may be possible for us to not actually require two binaries (you could image that azd --send-telemetry
is a thing that could work and would like us to that if possible. We can play tricks if needed to make sure these extra commands don't impact the customer facing UX, if needed.
I also think if we want to build an azd --update-install
like thing at some point it will be way less complex if we stick to a single binary.
Once we have this "run this stuff in the background after the tool exits" component we could look at moving the "update the latest version of the CLI for our up-to-date check" into there. We took steps to minimize the impact of that logic on end to end latency but I still feel it once and a while and this would more or less solve the problem.
Draft of end-to-end changes: #483