kedro-plugins
kedro-plugins copied to clipboard
[Draft] telemetry: Revamp telemetry data collection workflow
Description
Want to open this issue to collect feedback and ideas on improving the data collection workflow with kedro-telemetry
.
TODO: I'll add stuff from my work on the spike https://github.com/kedro-org/kedro/issues/2522.
Context
Why is this change important to you? How would you use it? How can it benefit other users?
Possible Implementation
(Optional) Suggest an idea for implementing the addition or change.
Possible Alternatives
(Optional) Describe any alternative solutions or features you've considered.
Related: #333
In https://github.com/kedro-org/kedro/issues/2519 we're running in circles again to make users install kedro-telemetry
without trying too hard. It's something that initially appeared in https://github.com/kedro-org/kedro/issues/2522, although @ankatiyar provided a solution to that problem already.
In my opinion, we should make the telemetry collection mechanism a mandatory dependency of Kedro, while still keeping the current opt-in flow for actually enabling such collection. I know this might ruffle some feathers but as long as we keep the opt-in flow explicit and robust, I don't think we're breaking any promises.
Otherwise I think it's better to not collect any telemetry at all.
One data point: in the past 30 days, kedro-telemetry had 13 % as many downloads as kedro.
- https://www.pepy.tech/projects/kedro
- https://www.pepy.tech/projects/kedro-telemetry
In kedro-org/kedro#2519 we're running in circles again to make users install
kedro-telemetry
without trying too hard. It's something that initially appeared in kedro-org/kedro#2522, although @ankatiyar provided a solution to that problem already.In my opinion, we should make the telemetry collection mechanism a mandatory dependency of Kedro, while still keeping the current opt-in flow for actually enabling such collection. I know this might ruffle some feathers but as long as we keep the opt-in flow explicit and robust, I don't think we're breaking any promises.
Otherwise I think it's better to not collect any telemetry at all.
I agree with @astrojuanlu , I think it will be better to incorporate telemetry directly into the Kedro codebase. This approach would involve prompting users for their opt-in consent during their first command execution if the environment variable hasn’t been set already. We would then record their response within the environment variable.
Currently, the prompt for telemetry participation occurs only after the plugin's installation, leading to confusion among users about the necessity of installing the plugin if they are not interested in participating. I think it would be more logical and user-friendly to inquire about telemetry participation at the first run of a Kedro command and only after obtaining the user's consent proceed with the plugin installation. However, from what I understand, this method might introduce technical uncertainties.
So I think a more reliable approach might be to integrate the telemetry plugin directly into Kedro itself.
👍🏽 In a first phase, we'll focus on clarifying the current scope of data collection and fix outstanding issues. We will continue working towards that goal later on, and draft our communications plan for users accordingly.
I will open a new issue with the next steps.