mimir
mimir copied to clipboard
Add usage report to Mimir
Along the lines of https://github.com/grafana/loki/pull/5361 NB, this took a few fixes, namely
- https://github.com/grafana/loki/pull/5364
- https://github.com/grafana/loki/pull/5369
- https://github.com/grafana/loki/pull/5406
Also see https://github.com/grafana/tempo-squad/issues/81
https://github.com/grafana/loki/blob/e15a03b5e5aa2828aeabfe24cfb3584ab88fcfda/cmd/loki/loki-local-config.yaml#L32-L43 gives a nice template for wording.
As a requirement for implementing this, I'd need to see as part of the PR:
- Documentation about exactly what pieces of information would be collected and an example payload (JSON or similar).
- How users can disable this beyond adding a CLI flag to the documentation of all CLI flags.
- How the information collected is determined and what the process for changing it is.
- Do the Mimir maintainers vote on this? Do they have any say?
- Is this controlled by Grafana? If so, who is responsible for approving it?
- Can anyone decide to increase the information collected or does it require approval from e.g. VP level, C-level, etc.
As per governance, rough consensus within Mimir team applies by default. Additionally, any Mimir team member can call a vote about any topic regarding the project at any time.
As a non-team member, I believe in following the principle of least surprise. As such, I would argue that data sent, syntax to disable sending, commented out section in default configuration, and documentation should mirror Tempo & Loki.
I'm going to work on this. Loki and Tempo already have it, and Mimir team wants to have anonymous statistics too, to better drive decisions when building features and supporting OSS users.
Requisites
- We want to follow how Loki and Tempo works (to keep it consistent)
- We want it to work out of the box with no additional config
Seed file
The seed file is a JSON file named mimir_cluster_seed.json and stored at the root of the blocks storage bucket (or under the configured -blocks-storage.storage-prefix).
This file is used to store the unique cluster ID in a durable storage.
The content of the file is:
{
# Random UUID uniquely identifying the Mimir cluster.
UID: "xxx",
# Timestamp of when the seed file was created.
created_at: "2006-01-02T15:04:05.999999999Z",
# Mimir version when the seed file was created.
# IMPORTANT: Loki and Tempo named this field "version" but I think it's too generic and may cause misunderstanding.
# Also I want to keep the door open to version this file, and the field name would be called "version".
created_version: {
version: "",
revision: "",
branch: "",
buildUser: "",
buildDate: "",
goVersion: "",
},
}
Report
The report is a JSON file periodically sent from each Mimir replica to a backend API. The report contains only anonymous statistics, used to better drive decisions when building features for the OSS community.
{
# The cluster ID.
"clusterID": "",
# When the cluster was created.
"createdAt": "",
# When the report was created (value is aligned across all replicas of the same Mimir cluster).
"interval": "",
# How frequently the report is sent, in seconds.
"intervalPeriod": 0.0,
# The "target" used to run Mimir.
"target": "",
# The current Mimir version.
"version": {},
# The current OS and architecture.
"os": "",
"arch": "",
# The Mimir edition. Supported values are: "oss", "enterprise".
"edition": "",
# Custom metrics tracked by Mimir. Can contain nested objects.
"metrics": {},
}
Mimir components tracking usage stats
To get it working out of the box, in the initial implementation Mimir will support tracking of usage statistics only from components already using the blocks storage (so that it's already configured):
- Ingesters
- Queriers (and rulers when the querier component is running internally)
- Store-gateway
- Compactor
Action plan
Part of this action plan is outside of Mimir scope (e.g. GEM), but I prefer to keep it as much transparent as possible given the only good intentions we have about using these anonymous reports (all in all we want to better support the community).
Build support in Mimir
- [x] Create seed file when doesn't exist, or wait for a stable seed file otherwise (PR)
- [x] Ensure it doesn't cause any issue with bucket scanning, bucket index creation or compactor
- [x] Document it as invalid tenant ID
- [x] Re-create seed file if corrupted
- [x] Vendor Mimir in GEM and fix changes to object store
Middlewares - [x] Periodically send report to backend API (PR)
- See
nextReport()logic in Loki
- See
- [x] Vendor Mimir in GEM and set the edition
- [x] Track custom metrics (PR)
- [x] Type of backend storage used (Loki example)
- [x] Ingester replication factor
- [x] Number of in-memory series in the ingester
- [x] Number of samples received in the ingester
- [x] Number of queries executed
- [x] CHANGELOG (PR)
- [x] Documentation (PR)
- [x] Why we collect anonymous usage stats
- [x] Which information is collected
- [x] How to disable it
- [x] Fix reporter: if a report fails to send, we need to try to send the same exact report, because counters are reset each time we build a new one (PR)
- [x] Track out of order time window configured (PR)
- [x] Remove the
experimentalflag, enable it by default, update the CHANGELOG and doc accordingly (PR)
Will follow up separately: Come up with a documented strict policy on how additional data collection should be reviewed and approved/rejected (and shared with Loki and Tempo too).
Build support in GEM
- [x] Set the edition to
enterprise
Build backend API support
- [x] Build support in the backend API to collect anonymous usage stats
Build dashboard to query back anonymous usage stats
- [x] Build "Mimir Usage Report" dashboard
One nit:
created_version: { version: "", revision: "", branch: "", buildUser: "", buildDate: "", goVersion: "", },
The information about which Mimir version created the file seems to be ephemeral, and I don't see why we would need it (debugging purposes in case it's wrong?)
The rest of the plan looks good to me! 👍
# Random UUID uniquely identifying the Mimir cluster.
UID: "xxx",
The comment says UUID, but the file says UID. UUID v4 are generally better than UIDs
I would argue that starting with a versioned, well, version would be better and that the other projects should also start versioning.
Nothing in the report explicitly tells me if it's Mimir or something else.
# The current Mimir version.
"version": {},
So maybe call this mimir_version and leave version free for versioning of the report itself?
Could a requirement of this feature please be documenting how the information collected will evolve over time, if at all? I ask because we're asking our OSS users to trust that we won't collect anything sensitive. My concern is that we inadvertently add some piece of information to the usage stats (because it would be useful to Grafana as a company) without a lot of scrutiny that causes privacy issues or similar. I know that Loki has documentation around how the feature works and we are planning to, but I'd like something that describes how the feature will work over time.
As an example we could document:
- We will only change the information collected in a major release (or minor release with a 2 version warning).
- Any new information collected will be mentioned in the release notes in a dedicated section.
- The documentation about how the feature works will always have the up-to-date list of information collected.
- OR we commit to never changing the information collected once this is in a release.
I definitely commit to write the doc and being as much clear as possible. We can't commit to a too strict policy like "we'll never change it" or "we'll change on major releases only", but we'll definitely be very clear about what we collect and why.
Strong +1 on being aggressively transparent on what's being collected.
Enabled by default, so consider this work done.