atlas
atlas copied to clipboard
Idea: Atlas UX Metrics Collection & Visualization
Background
An operator will right now have relatively poor visibility into the quality of the UX in their Atlas instance. There are some key metrics, specifically load times of various screens and assets and also error rates, which have a very significant impact on UX. There may be substantial systematic variation in what types of users are experiencing UX problems captured like these metrics, for example relating to type of device, browser, location, time of day, which infra providers are involved etc. Having greater visibility into these issues can allow for detection of problems, measurement of improvement as a result of various remedies, and doing deeper technical or operational diagnostics.
Proposal
Introduce a solution which gives the operator excellent visibility into such key metrics, both in terms of efficient data collection at scale. This solution should largely be relying on some industry standard SaS solution for doing the data collection and visualization, most of the work here should be to identify the appropriate such solution and then doing the integration work. Here are some requirements
- Should require minimal, if any changes to Atlas code base.
- Should require no changes to Orion, Argus, Colossus or any other code base: notice we collect no data in these apps.
- Should cover the following metrics: for each we want some basic additional request specific parameters
- Initial load time of app.
- Load time of each screen on the left hand menu
- Latency of playback on a video: (error type, ..)
- Playback failure
- Tx broadcast failure
- Upload failure
- Should require very limited code change to add an additional metric.
- Should include rich metadata about client (browser, location, device, version of Atlas, user account, ...)
- Should have a visualization dashboard which allows viewing frequency over different events: it must be possible to filter on the metric types, time period, parameter values.
- Should be configurable (no coding) in Atlas deployment what customer account the operator has to the backend service
- If the service has an alert feature, where events are triggered to send notifications when certain conditions are met, that is a nice to have, but its not the key point.