wakapi icon indicating copy to clipboard operation
wakapi copied to clipboard

Telemetry (need your opinion!)

Open muety opened this issue 2 years ago • 20 comments

I'd like to gather some anonymized technical statistics about self-hosted Wakapi instances to better understand how it is used. Of course, no personal data, no usernames, no IP address, etc. would be sent / stored and the feature would be ~opt-out~ opt-in.

Data I'd like to gather includes:

  • Wakapi version
  • Database system
  • Host system (OS, CPU cores, memory, Docker yes / no)
  • Statistics
    • Number of users
    • Number of heartbeats
    • Total coding time
  • Features
    • WakaTime relay enabled?
    • Weekly e-mail reports enabled?
    • Sentry tracking enabled?

Would you, as an operator, be concerned to share those data in an anonymized?

muety avatar Jan 04 '22 20:01 muety

Normally, I do tend to switch off telemetry if given the option. I think the comments on this reddit post summarises well the concerns behind telemetry and adding telemetry, if you want to take a look.

YC avatar Jan 06 '22 09:01 YC

Super interesting discussion there, thanks a lot for the good read! I agree that the feature should actually be opt-in rather than opt-out and that the user should be informed in great detail about what exactly is collected. Of course, as whole of Wakapi is, the implementation would be entirely open-source and open for validation.

Further opinions?

muety avatar Jan 06 '22 13:01 muety

Some ideas:

  • Have telemetry functions implemented in a separate file, to make it easy to audit
    • Invite discussion via PR upon implementation
  • Perhaps versioning of telemetry (as to not surprise users when more telemetry is implemented), e.g. the initial is v1, and you specify that in the config. When more telemetry is implemented, users can either stay on v1 or explicitly switch to v2 (with notice in logs or something to ask users to review upgrade)
  • Have a Wiki page (linked in readme and perhaps also in logs) which describe what is collected, how to switch on/off
    • Another aspect of this is collection of IPs and the policy around that
  • Be transparent with collected data - perhaps some site showing collected data (will be extra work, but likely worthwhile and an incentive for users to switch on telemetry in the first place), e.g. https://data.firefox.com

We should have more discussions on this, but perhaps new configs can be opt-out, but prior users is opt-in

YC avatar Jan 07 '22 11:01 YC

My thoughts about it: All in all it's OK, BUT the data should be collected by a privacy-orientated software, NOT Google Analytics or so. The second thing:

Be transparent with collected data - perhaps some site showing collected data (will be extra work, but likely worthwhile and an incentive for users to switch on telemetry in the first place), e.g. https://data.firefox.com

That would be a nice to have but not a must for me. For me, an opt-out is also OK, but, how I said, the service you would use should be privacy-friendly.

mawoka-myblock avatar Jan 07 '22 15:01 mawoka-myblock

Good thoughts, thank you!

[...] perhaps some site showing collected data [...]

Should users be able to view only collected data of their own instance or would you want all telemetry data to be open? For the latter, wouldn't that concern people even more, if not only Wakapi.dev maintainers, but anyone could view telemetry data?

[...] the data should be collected by a privacy-orientated software, NOT Google Analytics [...]

This goes without saying. Data would probably be dumped into a database on the same host as wakapi.dev and I'd add some very simple and basic analysis scripts.

muety avatar Jan 07 '22 15:01 muety

If I could choose what should be tracked, I would track the following:

  • Wakapi version
  • Database system
  • Host system (OS, ~~CPU cores, memory,~~ (CPU architecture?), Docker yes / no)
  • Statistics
    • Number of users ~~- Number of heartbeats~~ ~~- Total coding time~~
  • Features
    • WakaTime relay enabled?
    • Weekly e-mail reports enabled?
    • Sentry tracking enabled?

If you say, the number of heartbeats is very important to you for development, then ok, but I wouldn't know where this would help. The same for the other points, but if you say, they help you, I'm fine with it, but the use case isn't clear for me for these points.

mawoka-myblock avatar Jan 07 '22 15:01 mawoka-myblock

Total coding time wouldn't actually be of too much interest. Number of heartbeats, though, could be helpful, I think. When it comes to performance optimization (code or queries) it'd be nice to know what amounts of data the average user is dealing with. To that regard, hardware specs play a role as well.

What would be your concern sharing these information?

muety avatar Jan 07 '22 15:01 muety

Total coding time wouldn't actually be of too much interest. Number of heartbeats, though, could be helpful, I think. When it comes to performance optimization (code or queries) it'd be nice to know what amounts of data the average user is dealing with. To that regard, hardware specs play a role as well.

What would be your concern sharing these information?

No, not at all but to get the trust of the users, an explanation of why you collect what would be great!

mawoka-myblock avatar Jan 07 '22 15:01 mawoka-myblock

Good thoughts, thank you!

[...] perhaps some site showing collected data [...]

Should users be able to view only collected data of their own instance or would you want all telemetry data to be open? For the latter, wouldn't that concern people even more, if not only Wakapi.dev maintainers, but anyone could view telemetry data?

I don't see why it would be a concern, if you can't identify where the telemetry data came from and there's no user specific information.

YC avatar Jan 07 '22 23:01 YC

Have telemetry functions implemented in a separate file, to make it easy to audit

  • Invite discussion via PR upon implementation

This is actually important for a lot of distros as they patch out telemetry altogether. If most of the code is inside a single file and has minimal binding code to the core it makes it easier to patch and maintain.

mainrs avatar Jan 08 '22 12:01 mainrs

Super interesting discussion there, thanks a lot for the good read! I agree that the feature should actually be opt-in rather than opt-out and that the user should be informed in great detail about what exactly is collected. Of course, as whole of Wakapi is, the implementation would be entirely open-source and open for validation.

Further opinions?

I have no problem with opt in!

If it was opt out by default I would opt in, if it was opt in by default I would opt out

Should users be able to view only collected data of their own instance or would you want all telemetry data to be open? For the latter, wouldn't that concern people even more, if not only Wakapi.dev maintainers, but anyone could view telemetry data?

There is a certain secrecy to collecting telemetry and then hiding it away. Statistics interest people when it's not invasive, and your suggested usecases are not invasive.

boehs avatar Jan 08 '22 21:01 boehs

image

It would be decent to get collective stats for users that opt in to share their total coding time

MeerBiene avatar Jan 16 '22 08:01 MeerBiene

To reiterate the points that have already been said:

Honestly, it being opt-in is super important to me. Not only should it be opt-in for the instance itself, it should also be opt-in for each user. (Present the users with a dialogue when signing up, asking if they wish to enable it.)

Further, the following should be shown to the user/the person hosting the instance:

  • What is being sent.
  • Fine grained controls allowing users to enable and disable each and every feature. (No "all or nothing" switch)
  • How the data sent is being used.
    • And this includes "no confusing wording", such as when many sites list "tracking cookies" as "performance cookies" (which implies they will decrease performance of the site if you disable them)
    • It should also have minimal technical wording and be easy to understand.
  • The implications from that data being sent.
    • The users should be presented with a list of disadvantages and advantages to enabling/disabling certain features. It is never black and white, and as a developer, you should, honestly, guide users to the best choice for them. (Not implying you aren't already)

As for what is reported to the central instance, I think the following would be reasonable

  • Instance Tracking
    • Wakapi version
    • Database system & version
    • System information
      • OS
        • Operating System
        • Kernel Name
        • Kernel Release
        • Running in docker?
        • Uptime (maybe report average uptime?)
      • CPU & Cores
      • Memory
    • Statistics

      Note: User statistics will be expanded in the User Tracking section.

      • Number of users
      • Number of heartbeats
      • Total coding time
      • Option to forward all statistics to the central instance
      • Signup enabled
    • Features
      • WakaTime relay enabled
      • E-Mail reports enabled
      • Tracking enabled
  • User Tracking
    • Count total user count. Disabling makes this user not counted in the (public) user total. This includes the number reported to the central instance.
    • Count heartbeats towards total heartbeat count. Disabling makes this users' heartbeats not counted towards the (public) heartbeat count. This includes the number reported to the central instance.
    • Forward all statistics to the central instance. If this is enabled, then all statistics will be forwarded to the central instance. This is for leaderboards.

solonovamax avatar Feb 18 '22 14:02 solonovamax

I disagree. I think that controlling every point by hand is unnecessary work (For @muety). I also think that users don't need to be able to decide on their own, since the total coding time is already shown on the landing page. What I would do, to address this concern, is, I would show an info on the register page or something, where it tells the user whether telemetry is enabled or not. You should also show it in the bottom-left hand corner, next to the version and the database-driver.

mawoka-myblock avatar Feb 18 '22 15:02 mawoka-myblock

I disagree. I think that controlling every point by hand is unnecessary work (For @muety). I also think that users don't need to be able to decide on their own, since the total coding time is already shown on the landing page. What I would do, to address this concern, is, I would show an info on the register page or something, where it tells the user whether telemetry is enabled or not. You should also show it in the bottom-left hand corner, next to the version and the database-driver.

I believe users should be able to sign up to any instance of their choice and disable tracking if they so choose.

solonovamax avatar Feb 20 '22 19:02 solonovamax

Thanks for your opinion and the elaborate write-up!

To clarify, telemetry is not at all about user tracking. And, as already discussed earlier, no data will ever be included, that is attributable to individual users. In fact, no actual "content" will be sent at all, but only aggregated meta data instead.

When talking about user tracking (using tools like Google Analytics, Mixpanel, Matomo, etc.), I agree with @solonovamax, that users should have full control. There are no plans for Wakapi to employ such tools, though. Telemetry is much different from user tracking, and frankly, I've never seen a software project, where a user is given choices about what telemetry data the server instance reports.

muety avatar Feb 20 '22 20:02 muety

Fair point.

Thinking about it a bit more, I believe I may have misinterpreted the scope of what you were proposing.

In which case, I would agree with @mawoka-myblock,

What I would do, to address this concern, is, I would show an info on the register page or something, where it tells the user whether telemetry is enabled or not. You should also show it in the bottom-left hand corner, next to the version and the database-driver.

solonovamax avatar Feb 20 '22 22:02 solonovamax

Some news, @muety ?

mawoka-myblock avatar Mar 24 '22 20:03 mawoka-myblock

If you go ahead with this, can you be uber-transparent about what is collected thanks. Here's a good example of how to do it: https://www.plex.tv/en-au/about/privacy-legal/privacy-preferences/

Secondly, there should be a big delete my data button included in the settings, especially to deal with GPDR.

Also, if it's possible, then it would be great if the option was included in the settings page, and also the config file. If the user opts out of telemetry in the settings OR dashboard, then it should be off.

[settings]

# Your Wakapi server URL or 'https://wakapi.dev' when using the cloud server
api_url = http://localhost:3000/api/heartbeat

# Your Wakapi API key (get it from the web interface after having created an account)
api_key = 406fe41f-6d69-4183-a4cc-121e0c524c2b

# Telemetry - data collected by wakapi, see how we use the data at https://wakatime.com/privacy
telemetry = on

tgrrr avatar Apr 13 '22 06:04 tgrrr

To be fair, I think having the resulting anonymized data served publicy is a good idea. Right now the only argument I seen against it was because people were feared of it exposing data. Which would mean the tracking itself already is sharing data which you would be uncomfortable if you were to know about people using that data.

One project with a similar thought process is https://www.offen.dev, which allows the user to see their collected data, and delete it at any time.

luckydonald avatar Aug 29 '22 13:08 luckydonald

For an example of public telemetry, there's the Minecraft server implementation Paper that uses bStats

https://bstats.org/plugin/server-implementation/Paper

bStats is a Java library for Minecraft server plugins that tracks when the server is online and send the host's config.

Showing these stats can give more trust in the telemetry. It also creates an implicit understanding between the maintainer and its userbase: If you can't display that stat, then don't track it.

NatoBoram avatar Mar 29 '23 20:03 NatoBoram

Won't implement telemetry.

muety avatar Jul 09 '23 18:07 muety