Rubberduck icon indicating copy to clipboard operation
Rubberduck copied to clipboard

Exploring Telemetry

Open retailcoder opened this issue 4 years ago • 20 comments

We have many features, some more discoverable than others. We have memory pressure and performance issues, and only a vague intuitive idea of what's in our users' VBA projects that's based on our own individual experiences. We do have logging, and it does help (a lot) with debugging and diagnosis, but statistically a bug report or log file is nothing but an anecdote.

If Rubberduck had an opt-in setting to enable transparent telemetry (there's no way this is getting implemented without making very explicit what's being sent, where, when, and how), we could collect usage data, aggregate it, and craft a lovely PowerBI dashboard and monthly reports that could shed a lot of light on many, many things.

Some ideas, for usage data:

  • Distribution of OS and host application versions
  • What commands are used, from which menus (or hotkey?)
  • What inspections fire results, what results are ignored
  • What the user settings are; hotkey configs, logging level, UI language, inspection severities, identifier whitelist, indenter settings, todo markers, etc.
  • How is the unit testing framework being used? Are fakes used? Mocks?

Other ideas, for various metrics:

  • How long does it take to fully parse/process a project
  • How much code breaks SLL prediction mode
  • How many modules (of what kind) are in a VBA project
  • How many lines of code are in a module; cyclomatic complexity, nesting, etc.
  • What type libraries are referenced, what non-usercode members are invoked

The storage format probably requires a number of tables. How would that be best organized?

Anonymity concerns:

No PII or otherwise sensitive information shall be collected; usercode identifier names would only be collected with explicit and specific consent; it should be impossible to look at any given telemetry record and be able to say with 100% certainty "hey that's my record!".

Consuming the data:

The entire database shall be queryable with a public REST API; monthly reports could be emailed to subscribers.


Thoughts? Ideas? Concerns? Let's discuss this inside out.

retailcoder avatar Aug 23 '19 14:08 retailcoder

I would have no problem with this at all. If its going to help you guys in any way at all, its worth doing.

chrisdaniels avatar Aug 27 '19 09:08 chrisdaniels

Typelibs sounds OK, just not actual lib names as some may be commercial libraries that may identify a class of user. I'd be happy to test it, as long as I can see the collected data BEFORE transmitting it.

SystemsModelling avatar Aug 29 '19 13:08 SystemsModelling

That's critical - the collected data should be visible on the client side at any time, in a nice human-readable format. IMO usercode, references, project and component names MUST be excluded, it shouldn't be possible to give consent for any of that.

mansellan avatar Aug 29 '19 14:08 mansellan

Thinking about consent - thinking we could add a page to the installer, giving a synopsis of what would be collected, a decription of where the on/off switch is in the main addin, and a link for further details, with options:

  • Disable completely
  • Not right now (pre-selected)
  • Yes, that's fine.

For the top option, the installer could omit installation of the telemetry assembly at all, which should satisfy corporates with a risk-averse posture.

mansellan avatar Aug 29 '19 14:08 mansellan

@mansellan I like that!

retailcoder avatar Aug 29 '19 14:08 retailcoder

Another idea: "Send a frown :frowning_face:" and "Send a smile :smiley:" user feedback features, like Microsoft does with e.g. Excel telemetry?

retailcoder avatar Sep 12 '19 18:09 retailcoder

  • Not right now (pre-selected)

What is the difference between that and "Disabled"? Do we prompt again week later, or something?

Hosch250 avatar Sep 12 '19 18:09 Hosch250

@Hosch250 that would be an installer prompt, so "disable completely" could not even install the Rubberduck.Telemetry assembly, while "not right now" would install it, but leave the setting disabled.

retailcoder avatar Sep 12 '19 18:09 retailcoder

That'd be a pain if someone toggled it to Enabled after installing with it Disabled. Alternately, would we remove the DLL if they installed in to Enabled, then toggled to Disabled?

Hosch250 avatar Sep 12 '19 18:09 Hosch250

I think that if RD is installed under the "Disable Completely" option, the Telemetry page in the settings dialog should still be visible, but with wording like:

"Telemetry is not currently installed. If you wish to enable telemetry, please go to Control Panel, Programs and Features, then run the Rubberduck installer using Modify".

This:

  1. Provides a route to enable later, but
  2. Gives reassurance to the corporate IT reviewer that telemetry is an install-only option (which they can and will lock out)

mansellan avatar Sep 12 '19 21:09 mansellan

Just installed Telerik Fiddler, and noticed this in the license agreement:

On startup, the Software anonymously checks for new versions; you may disable this feature if you prefer. You may opt-in to submitting anonymous data about your system configuration and use of the Software to help improve future versions of the Software. If you opt-in, Telerik may collect data related to: certain features and extensions of the Software, identifying trends and bugs, activation information, usage statistics and may track other data related to your use of the Software as further described in the most current version of Telerik’s Privacy Policy (located at: http://www.telerik.com/company/privacy-policy). You may be asked, from time to time, to respond to short survey questions presented within the Software’s user environment. Telerik may use your responses to these questions to serve you with targeted advertising content, to improve the Software, and/or for other purposes as described within the Privacy Policy. By your responding to such questions, opting-in to data collection, and/or acceptance of these terms and/or use of the Software, you authorize the collection, use and disclosure of all responses and data for the purposes provided for herein and/or in the Privacy Policy.

And this prompt on first startup:

Help Improve Progress Telerik Fiddler?

I like this approach... we'll need an explicit "privacy policy" legalese document though.

retailcoder avatar Sep 26 '19 15:09 retailcoder

Maybe you can reach out to the Software Freedom Law Center. They offer pro-bono services for FLOSS projects. Not sure what requirements they have for determine who they’re willing to work with.

http://www.softwarefreedom.org/

rubberduck203 avatar Sep 26 '19 16:09 rubberduck203

@mansellan

IMO usercode, references, project and component names MUST be excluded, it shouldn't be possible to give consent for any of that.

source

Would it be possible to differentiate between elements and libraries from "standard VBA stuff" -- such as Excel, Access, ADO, DAO, WIA, MSHTML, Regex -- and custom user projects or referenced libraries? Maybe a list of the standard ones, and any non-standard library or element (element from a non-standard library) should not be included?

zspitz avatar Oct 17 '19 13:10 zspitz

Hmm, hadn't considered that... I can't see the harm in having a library whitelist. Another option could be to hash all referenced libraries and send just the hashes, which we could then match up to hashes of known libraries. Either way, no private info is sent.

mansellan avatar Oct 17 '19 19:10 mansellan

Rubberduck has a large presence in the VBA ecosystem, which makes data it collects a particularly useful and accurate representation of "people who use VBA and care enough about the developer experience to install extensions". Consequently, the information about who uses RD:

  • OS version and bitness.
  • Admin/Non-Admin.
  • Primary host application.
  • RD version (to work out frequency of updates).

... and how they use VBA:

  • VBA version (VBA7, VBA<=6 or VB6).
  • Libraries referenced.
  • Use of add-ins and references or importing code.

... (as opposed to information about how they use RD itself) - All this has value beyond just improving Rubberduck and can guide the design of other tools, libraries, and extensions within the VBA ecosystem. For example, I am creating an open-source VBA package manager which I feel has a similar target audience to users of RD and so I'd like to get a better understanding of that market to influence the design decisions I make.

All this is to say, I'm really in favour of this user data being gathered as it has value to both RD and a wider community.

Greedquest avatar Jan 05 '22 14:01 Greedquest

As an aside... I know some software downloads like VSCode automatically detect they Operating System and bitness of the user in order to suggest an appropriate version of the installer to download (presumably this info is available to the browser). That may be a quick and dirty way to gather some demographic data at the install stage without needing to modify Rubberduck at all.

Incidentally most of the metrics I'm interested in about who uses extensions like Rubberduck are known at install time, so it might be possible to bolt a simple one time thing onto the installer rather than setting up regular reporting of telemetry data.

Greedquest avatar Jan 05 '22 14:01 Greedquest

As another aside, I think this has some degree of urgency as it could help prioritise the large number of issues based on how frequently used a feature is (or underused because it is broken) and motivate decisions for new features which is always a good feeling. I cannot deny my vested interest though 😉...

Greedquest avatar Jan 05 '22 14:01 Greedquest

I am fully opposed to telemetry both personally and professionally. I dislike it in my personal life and always turn it off. At work I am required to turn off all capabilities to phone home. I would need an installer that doesn't include these features to avoid costly security review.

A9G-Data-Droid avatar Jan 05 '22 21:01 A9G-Data-Droid

Hm, not a good idea. In case telemetry is needed, providing 2 different installers would be good. Whoever wants to install Rubberduck with telemetry, would download it with that feature, whoever does not, would be 100% sure they are downloading a private app. Otherwise, some companies will not allow this tool anymore.

yuriykaz avatar Dec 21 '23 19:12 yuriykaz

@yuriykaz thanks for the feedback! You're absolutely correct, this is something that needs complete transparency and has to be explicitly opt-in (as opposed to opt-out).

Telemetry isn't going to happen with 2.x, but is being seriously considered for 3.0, especially since the Language Server Protocol (LSP) defines a specific notification for this.

We're still quite far from having an installer for 3.x, but the way it's being envisioned is closer to how Visual Studio does it: you'd be installing the latest version of the Rubberduck Installer / Update Server, and that's where you'd tick a box to have the completely optional telemetry server installed along with the rest of RD3 components; the installer would only download the components that must be installed.

With the telemetry server installed, an explicit configuration will still be needed to enable telemetry events (most will be disabled by default), so nothing is transmitted without having been configured to be, and the idea is to allow all telemetry data to be reviewable before it's transmitted; transmission itself would be manual unless configured otherwise.

retailcoder avatar Dec 21 '23 19:12 retailcoder