mimir Add usage report to Mimir

trafficstars

Along the lines of https://github.com/grafana/loki/pull/5361 NB, this took a few fixes, namely

https://github.com/grafana/loki/pull/5364
https://github.com/grafana/loki/pull/5369
https://github.com/grafana/loki/pull/5406

May 04 '22 10:05 RichiH

Also see https://github.com/grafana/tempo-squad/issues/81

https://github.com/grafana/loki/blob/e15a03b5e5aa2828aeabfe24cfb3584ab88fcfda/cmd/loki/loki-local-config.yaml#L32-L43 gives a nice template for wording.

May 04 '22 12:05 RichiH

As a requirement for implementing this, I'd need to see as part of the PR:

Documentation about exactly what pieces of information would be collected and an example payload (JSON or similar).
How users can disable this beyond adding a CLI flag to the documentation of all CLI flags.
How the information collected is determined and what the process for changing it is.
- Do the Mimir maintainers vote on this? Do they have any say?
- Is this controlled by Grafana? If so, who is responsible for approving it?
  - Can anyone decide to increase the information collected or does it require approval from e.g. VP level, C-level, etc.

Jun 29 '22 13:06 56quarters

As per governance, rough consensus within Mimir team applies by default. Additionally, any Mimir team member can call a vote about any topic regarding the project at any time.

As a non-team member, I believe in following the principle of least surprise. As such, I would argue that data sent, syntax to disable sending, commented out section in default configuration, and documentation should mirror Tempo & Loki.

Jun 29 '22 15:06 RichiH

I'm going to work on this. Loki and Tempo already have it, and Mimir team wants to have anonymous statistics too, to better drive decisions when building features and supporting OSS users.

Aug 02 '22 08:08 pracucci

Requisites

We want to follow how Loki and Tempo works (to keep it consistent)
We want it to work out of the box with no additional config

Seed file

The seed file is a JSON file named mimir_cluster_seed.json and stored at the root of the blocks storage bucket (or under the configured -blocks-storage.storage-prefix). This file is used to store the unique cluster ID in a durable storage.

The content of the file is:

{
    # Random UUID uniquely identifying the Mimir cluster.
    UID: "xxx",

    # Timestamp of when the seed file was created.
    created_at: "2006-01-02T15:04:05.999999999Z",

    # Mimir version when the seed file was created.
    # IMPORTANT: Loki and Tempo named this field "version" but I think it's too generic and may cause misunderstanding.
    #            Also I want to keep the door open to version this file, and the field name would be called "version".
    created_version: {
        version: "",
        revision: "",
        branch: "",
        buildUser: "",
        buildDate: "",
        goVersion: "",
    },
}

Report

The report is a JSON file periodically sent from each Mimir replica to a backend API. The report contains only anonymous statistics, used to better drive decisions when building features for the OSS community.

{
    # The cluster ID.
    "clusterID": "",

    # When the cluster was created.
    "createdAt": "",

    # When the report was created (value is aligned across all replicas of the same Mimir cluster).
    "interval": "",

    # How frequently the report is sent, in seconds.
    "intervalPeriod": 0.0,

    # The "target" used to run Mimir.
    "target": "",

    # The current Mimir version.
    "version": {},

    # The current OS and architecture.
    "os": "",
    "arch": "",

    # The Mimir edition. Supported values are: "oss", "enterprise".
    "edition": "",

    # Custom metrics tracked by Mimir. Can contain nested objects.
    "metrics": {},
}

Mimir components tracking usage stats

To get it working out of the box, in the initial implementation Mimir will support tracking of usage statistics only from components already using the blocks storage (so that it's already configured):

Ingesters
Queriers (and rulers when the querier component is running internally)
Store-gateway
Compactor

Action plan

Part of this action plan is outside of Mimir scope (e.g. GEM), but I prefer to keep it as much transparent as possible given the only good intentions we have about using these anonymous reports (all in all we want to better support the community).

Build support in Mimir

[x] Create seed file when doesn't exist, or wait for a stable seed file otherwise (PR)
- [x] Ensure it doesn't cause any issue with bucket scanning, bucket index creation or compactor
- [x] Document it as invalid tenant ID
- [x] Re-create seed file if corrupted
[x] Vendor Mimir in GEM and fix changes to object store Middlewares
[x] Periodically send report to backend API (PR)
- See nextReport() logic in Loki
[x] Vendor Mimir in GEM and set the edition
[x] Track custom metrics (PR)
- [x] Type of backend storage used (Loki example)
- [x] Ingester replication factor
- [x] Number of in-memory series in the ingester
- [x] Number of samples received in the ingester
- [x] Number of queries executed
[x] CHANGELOG (PR)
[x] Documentation (PR)
- [x] Why we collect anonymous usage stats
- [x] Which information is collected
- [x] How to disable it
[x] Fix reporter: if a report fails to send, we need to try to send the same exact report, because counters are reset each time we build a new one (PR)
[x] Track out of order time window configured (PR)
[x] Remove the experimental flag, enable it by default, update the CHANGELOG and doc accordingly (PR)

Will follow up separately: Come up with a documented strict policy on how additional data collection should be reviewed and approved/rejected (and shared with Loki and Tempo too).

Build support in GEM

[x] Set the edition to enterprise

Build backend API support

[x] Build support in the backend API to collect anonymous usage stats

Build dashboard to query back anonymous usage stats

[x] Build "Mimir Usage Report" dashboard

Aug 02 '22 10:08 pracucci

One nit:

    created_version: {
        version: "",
        revision: "",
        branch: "",
        buildUser: "",
        buildDate: "",
        goVersion: "",
    },

The information about which Mimir version created the file seems to be ephemeral, and I don't see why we would need it (debugging purposes in case it's wrong?)

The rest of the plan looks good to me! 👍

Aug 02 '22 10:08 colega

    # Random UUID uniquely identifying the Mimir cluster.
    UID: "xxx",

The comment says UUID, but the file says UID. UUID v4 are generally better than UIDs

I would argue that starting with a versioned, well, version would be better and that the other projects should also start versioning.

Nothing in the report explicitly tells me if it's Mimir or something else.

    # The current Mimir version.
    "version": {},

So maybe call this mimir_version and leave version free for versioning of the report itself?

Aug 02 '22 11:08 RichiH

Could a requirement of this feature please be documenting how the information collected will evolve over time, if at all? I ask because we're asking our OSS users to trust that we won't collect anything sensitive. My concern is that we inadvertently add some piece of information to the usage stats (because it would be useful to Grafana as a company) without a lot of scrutiny that causes privacy issues or similar. I know that Loki has documentation around how the feature works and we are planning to, but I'd like something that describes how the feature will work over time.

As an example we could document:

We will only change the information collected in a major release (or minor release with a 2 version warning).
Any new information collected will be mentioned in the release notes in a dedicated section.
The documentation about how the feature works will always have the up-to-date list of information collected.
OR we commit to never changing the information collected once this is in a release.

Aug 08 '22 16:08 56quarters

I definitely commit to write the doc and being as much clear as possible. We can't commit to a too strict policy like "we'll never change it" or "we'll change on major releases only", but we'll definitely be very clear about what we collect and why.

Aug 08 '22 16:08 pracucci

Strong +1 on being aggressively transparent on what's being collected.

Aug 09 '22 10:08 RichiH

Enabled by default, so consider this work done.

Sep 15 '22 15:09 pracucci

mimir mimir copied to clipboard

Add usage report to Mimir

Requisites

Seed file

Report

Mimir components tracking usage stats

Action plan

Build support in Mimir

Build support in GEM

Build backend API support

Build dashboard to query back anonymous usage stats

mimir
mimir copied to clipboard