Design doc for an OSO telemetry service
Describe the feature you'd like to request
Our only solution for user analytics right now is to index blockchain data. It'd be nice to be able to get high-leve; user stats for mobile, desktop, web applications (e.g. rotki, remix etc).
For reference, Remix uses Matomo https://medium.com/remix-ide/help-us-improve-remix-ide-66ef69e14931 https://github.com/ethereum/remix-project/pull/919/files
Describe the solution you'd like
I think we need to think this through a bit more, but current proposal is to host an OSO telemetry service. We can distribute a client-side snippet to integrate into web applications.
Ideally:
- The telemetry only collects high-level usage information for now, not instrumenting every event.
- The telemetry uses best practices to keep users anonymous, without PII, but is able to aggregate stats for a single user (we want unique users, not just total views)
- We provide ways for users to opt out / in. This needs to play well with an application's existing modals
- We design this in a way to reasonably decentralize in the future.
Describe alternatives you've considered
We could try to build integrations for all of the major telemetry services (e.g. Google Analytics, Mixpanel, Amplitude, Matomo etc).
Pros:
- We get really rich data, the same data that the product/engineering teams of the app builders
- We'd probably just integrate with some pre-existing data warehouse connector that already exists on the backend
Cons:
- Developers may not want to share all that data with us
- There's a lot of diversity in telemetry configurations, such as which service they use, if self-hosted, where is the data hosted, are they trying to do different forms of privacy-preserving or anonymizing analytics.
Categories of things that we care about
- Usage / analytics
- Performance (throughput, latency)
- Availability (i.e. uptime)
- NPS (i.e. impact attestation)
Types of telemetry
- Server-side telemetry (e.g. indexers, RPC nodes)
- Client-side telemetry (e.g. wallets, applications like rotki)
- Superchain telemetry (e.g. throughput, latency, success rates, etc)
What does the integration look like?
- Client libraries
- Server-side replication
Threat model:
- Direct manipulation of the data
- Hiring botnets to generate real-looking data
- Incentivizing real users to do metrics boosting things (i.e. make more wallets)