Re-enable site analytics for jupyter.org
Next actions
- [x] Decide if it's safe to try Plausible for our web analytics
- [x] Merge Plausible analytics on jupyter.org on a 1-month trial: #816
- [x] See our public dashboard here
- [x] Wait a month and see if we generally like the Plausible experience.
- [x] (Late September) Decide what to do next
Context
A few years ago we stopped using Google Analytics due to GDPR concerns. This means that we no longer have information about how people are navigating Jupyter.org (or other jupyter sub-projects). This kind of information is useful for demonstrating reach and impact, and is also useful in guiding Marketing experiments to understand whether we're directing attention to Jupyter resources in expected ways.
I propose that we identify an analytics service we can use with jupyter.org that:
- Is not at-risk of legal challenges relating to GDPR
- Gives us information about traffic flow in/out/through jupyter.org
- Can scale to subproject docs as well
- Does not require significant human time to maintain
- Is not extremely expensive
Idea for implementation
Chris's suggestion: use a plausible.io plan.
Plausible.io is what I've heard most-often recommended here. It is fairly expensive though, and limited to 10 sites for an enterprise plan. They do not have open source or non-profit plans. They have their codebase online and you can self-host it, but I'd strongly suggest we just pay for them.
Anybody know of obvious better alternatives?
I know there are a million analytics services out there. If somebody knows of another one that matches the conditions above and is obviously better than Plausible, then please suggest it! If not obvious, then I'd lean towards "just using plausible" so that we go down a rabbit hold of debate.
[!NOTE] Anybody want to advocate for Matomo?
I think that's the most likely alternative, and what the Binder team uses. That said, in Chris' experience the Matomo instance for mybinder.org has been very slow and kind of unreliable.
How to pay for this
I suggest we:
- Ask the Foundation for a discretionary Marketing budget that they + the JMS have control over, and use this to pay for Plausible. If not that, then:
- Use EC discretionary funds for this. If not that, then:
- Write a proposal for the Foundation specifically for this via community proposals.
Note that they have a 30-day free trial, no credit card required. We can put them on the website today to experiment.
History:
- 2013: The very first issue on this repo (#1) was about adding google analytics, which was promptly done
- 2018: https://github.com/jupyter/jupyter.github.io/pull/279 anonymized google analytics for GDPR
- 2021: https://github.com/jupyter/jupyter.github.io/pull/408 removed Google Analytics. We had lots of discussion at that time about plausible. In particular, @Carreau mentions that https://views.scientific-python.org/ is an open source server to serve scientific python ecosystem projects.
I would extend the "identify an analytics service we can use with jupyter.org" requirement to include something that can potentially scale to our docs as well.
Good point @jasongrout - is there any objection to us setting up Plausible analytics to get a baseline for our traffic? This will help us:
- Test the Plausible UI
- Get a baseline for our traffic, which will tell us the likely cost of analytics
To implement this, we'd need to paste the plausible javascript snippet into the
of jupyter.orgAlso, I'll add your suggestion about scaling to docs in the head.
I'm +1 on trying it out with the free trial.
👍 to trying out plausible. Doubly so if we can actually join views.scientific-python.org
FYI, there's also some discussion on zulip around this.
If anybody wants to action the "trial run", I'm +1 on just merging away and trying this out with Plausible.
On it...
PR up at https://github.com/jupyter/jupyter.github.io/pull/816. It's using a 30-day free trial tied to [email protected] now.
overall i approve paying for it and not self hosting.
In particular, https://plausible.io/blog/open-source-saas encourages me.
We’re not interested in venture capitalism, in the chase for the endless hyper-growth, or in building a unicorn. We don’t have any goals about world domination.
Let's give them money if we can.
I also pinged them to see if they give discounts to open source projects.
I've merged @jasongrout 's PR! In the few minutes between that and me posting here, we've had 20 people visit jupyter.org 😅
I've also edited the top comment with next actions
Can you make the stats public (apparently it's an option on plausible) and post a link to it here ?
I reached out to Plausible about discounts for open source projects, and received an email back offering us a 15% one-time introductory discount when we sign up for the service. Just contact them and they can send us back a discount code.
Can you make the stats public (apparently it's an option on plausible) and post a link to it here ?
https://plausible.io/jupyter.org
I added that link to the issue body, thanks @jasongrout ! That's such a cool resource - I'll signal-boost it a little bit so others learn about this effort.
based on our daily hit rate, we have something like >8m hits a year which means we'd be somewhere in this cost range:
this also makes me feel like we'd drive a lot more attention to our blog if we served it from jupyter.org/blog instead of blog.jupyter.org 🤔
The plausible script https://plausible.io/js/script.hash.outbound-links.js is present on subpages (https://jupyter.org/*) , but missing from the top-level https://jupyter.org/
Thanks for spotting that @manics ! I just tried to fix this and accidentally pushed a commit to main instead of making a PR - sorry about that folks. It is simple-enough so I am just going to leave it, because I think it does fix the issue. here's the commit:
https://github.com/jupyter/jupyter.github.io/commit/e5bfbe2365b052f66f5260f07611d9c320009ef7
It moves the script from page-header.html to the default template, and moves it to <head> instead of the HTML element <header>. @manics I think this answers your question in https://github.com/jupyterhub/team-compass/issues/803 - no it doesn't have to be in <head> :-)
ReadTheDocs has analytics
It looks like we can enable analytics for readthedocs sites by going to the settings > Addons > Analytics and checking the box to enable it.
CC @choldgraf
GitHub also has analytics!
I also learned that GitHub has repository analytics!
https://docs.github.com/en/repositories/viewing-activity-and-data-for-your-repository/viewing-traffic-to-a-repository
Nice! It looks like we can access that from the API as well, so we should be able to accumulate traffic data over time too.