jupyter.github.io icon indicating copy to clipboard operation
jupyter.github.io copied to clipboard

Re-enable site analytics for jupyter.org

Open choldgraf opened this issue 3 months ago • 23 comments

Next actions

  • [x] Decide if it's safe to try Plausible for our web analytics
  • [x] Merge Plausible analytics on jupyter.org on a 1-month trial: #816
  • [x] See our public dashboard here
  • [x] Wait a month and see if we generally like the Plausible experience.
  • [x] (Late September) Decide what to do next

Context

A few years ago we stopped using Google Analytics due to GDPR concerns. This means that we no longer have information about how people are navigating Jupyter.org (or other jupyter sub-projects). This kind of information is useful for demonstrating reach and impact, and is also useful in guiding Marketing experiments to understand whether we're directing attention to Jupyter resources in expected ways.

I propose that we identify an analytics service we can use with jupyter.org that:

  • Is not at-risk of legal challenges relating to GDPR
  • Gives us information about traffic flow in/out/through jupyter.org
  • Can scale to subproject docs as well
  • Does not require significant human time to maintain
  • Is not extremely expensive

Idea for implementation

Chris's suggestion: use a plausible.io plan.

Plausible.io is what I've heard most-often recommended here. It is fairly expensive though, and limited to 10 sites for an enterprise plan. They do not have open source or non-profit plans. They have their codebase online and you can self-host it, but I'd strongly suggest we just pay for them.

Anybody know of obvious better alternatives?

I know there are a million analytics services out there. If somebody knows of another one that matches the conditions above and is obviously better than Plausible, then please suggest it! If not obvious, then I'd lean towards "just using plausible" so that we go down a rabbit hold of debate.

[!NOTE] Anybody want to advocate for Matomo?

I think that's the most likely alternative, and what the Binder team uses. That said, in Chris' experience the Matomo instance for mybinder.org has been very slow and kind of unreliable.

How to pay for this

I suggest we:

  • Ask the Foundation for a discretionary Marketing budget that they + the JMS have control over, and use this to pay for Plausible. If not that, then:
  • Use EC discretionary funds for this. If not that, then:
  • Write a proposal for the Foundation specifically for this via community proposals.

choldgraf avatar Aug 29 '25 18:08 choldgraf

Note that they have a 30-day free trial, no credit card required. We can put them on the website today to experiment.

jasongrout avatar Aug 29 '25 18:08 jasongrout

History:

  • 2013: The very first issue on this repo (#1) was about adding google analytics, which was promptly done
  • 2018: https://github.com/jupyter/jupyter.github.io/pull/279 anonymized google analytics for GDPR
  • 2021: https://github.com/jupyter/jupyter.github.io/pull/408 removed Google Analytics. We had lots of discussion at that time about plausible. In particular, @Carreau mentions that https://views.scientific-python.org/ is an open source server to serve scientific python ecosystem projects.

jasongrout avatar Aug 29 '25 18:08 jasongrout

I would extend the "identify an analytics service we can use with jupyter.org" requirement to include something that can potentially scale to our docs as well.

jasongrout avatar Aug 29 '25 18:08 jasongrout

Good point @jasongrout - is there any objection to us setting up Plausible analytics to get a baseline for our traffic? This will help us:

  • Test the Plausible UI
  • Get a baseline for our traffic, which will tell us the likely cost of analytics

To implement this, we'd need to paste the plausible javascript snippet into the

of jupyter.org

Also, I'll add your suggestion about scaling to docs in the head.

choldgraf avatar Aug 29 '25 18:08 choldgraf

I'm +1 on trying it out with the free trial.

jasongrout avatar Aug 29 '25 18:08 jasongrout

👍 to trying out plausible. Doubly so if we can actually join views.scientific-python.org

minrk avatar Aug 29 '25 19:08 minrk

FYI, there's also some discussion on zulip around this.

jasongrout avatar Aug 29 '25 19:08 jasongrout

If anybody wants to action the "trial run", I'm +1 on just merging away and trying this out with Plausible.

choldgraf avatar Aug 29 '25 19:08 choldgraf

On it...

jasongrout avatar Aug 29 '25 19:08 jasongrout

PR up at https://github.com/jupyter/jupyter.github.io/pull/816. It's using a 30-day free trial tied to [email protected] now.

jasongrout avatar Aug 29 '25 19:08 jasongrout

overall i approve paying for it and not self hosting.

yuvipanda avatar Aug 29 '25 19:08 yuvipanda

In particular, https://plausible.io/blog/open-source-saas encourages me.

We’re not interested in venture capitalism, in the chase for the endless hyper-growth, or in building a unicorn. We don’t have any goals about world domination.

Let's give them money if we can.

yuvipanda avatar Aug 29 '25 19:08 yuvipanda

I also pinged them to see if they give discounts to open source projects.

jasongrout avatar Aug 29 '25 20:08 jasongrout

I've merged @jasongrout 's PR! In the few minutes between that and me posting here, we've had 20 people visit jupyter.org 😅

I've also edited the top comment with next actions

Image

choldgraf avatar Aug 29 '25 22:08 choldgraf

Can you make the stats public (apparently it's an option on plausible) and post a link to it here ?

Carreau avatar Sep 02 '25 06:09 Carreau

I reached out to Plausible about discounts for open source projects, and received an email back offering us a 15% one-time introductory discount when we sign up for the service. Just contact them and they can send us back a discount code.

jasongrout avatar Sep 02 '25 20:09 jasongrout

Can you make the stats public (apparently it's an option on plausible) and post a link to it here ?

https://plausible.io/jupyter.org

jasongrout avatar Sep 02 '25 20:09 jasongrout

I added that link to the issue body, thanks @jasongrout ! That's such a cool resource - I'll signal-boost it a little bit so others learn about this effort.

based on our daily hit rate, we have something like >8m hits a year which means we'd be somewhere in this cost range:

Image

this also makes me feel like we'd drive a lot more attention to our blog if we served it from jupyter.org/blog instead of blog.jupyter.org 🤔

choldgraf avatar Sep 02 '25 21:09 choldgraf

The plausible script https://plausible.io/js/script.hash.outbound-links.js is present on subpages (https://jupyter.org/*) , but missing from the top-level https://jupyter.org/

manics avatar Sep 03 '25 12:09 manics

Thanks for spotting that @manics ! I just tried to fix this and accidentally pushed a commit to main instead of making a PR - sorry about that folks. It is simple-enough so I am just going to leave it, because I think it does fix the issue. here's the commit:

https://github.com/jupyter/jupyter.github.io/commit/e5bfbe2365b052f66f5260f07611d9c320009ef7

It moves the script from page-header.html to the default template, and moves it to <head> instead of the HTML element <header>. @manics I think this answers your question in https://github.com/jupyterhub/team-compass/issues/803 - no it doesn't have to be in <head> :-)

choldgraf avatar Sep 03 '25 16:09 choldgraf

ReadTheDocs has analytics

It looks like we can enable analytics for readthedocs sites by going to the settings > Addons > Analytics and checking the box to enable it.

Image

CC @choldgraf

jasongrout avatar Sep 19 '25 15:09 jasongrout

GitHub also has analytics!

I also learned that GitHub has repository analytics!

https://docs.github.com/en/repositories/viewing-activity-and-data-for-your-repository/viewing-traffic-to-a-repository

Image

choldgraf avatar Sep 20 '25 17:09 choldgraf

Nice! It looks like we can access that from the API as well, so we should be able to accumulate traffic data over time too.

jasongrout avatar Sep 21 '25 08:09 jasongrout