team-compass Matomo not working for large queries on mybinder.org

I was trying to take a look at the data from 2022 so that I could update our impact report from 2021, but I was not able to load any of the graphs that our Matomo instance produces.

Does anybody know what's going on? I have no knowledge of how Matomo works so am not really able to debug...

For example, when trying to view visits over 2022, I ran into this error (I got the same error when reducing the time to 1 week, so don't think it's a data overload thing):

Mar 22 '23 16:03 choldgraf

I also so that error on first visit, but as soon as I started opening logs and refreshed the page to start debugging, everything started to work. Not sure what's up, but does it work for you now?

Mar 23 '23 09:03 minrk

Maybe the initial request timed out or similar because it was loading up a lot of data?

Mar 23 '23 10:03 consideRatio

Hmmm I've tried again today and got faster responses in general, but it still hangs and throws those errors when asking for 2022 data. Maybe there is some kind of bottleneck in the data querying that becomes more/less of a bottleneck depending on load? Maybe we're running into resource constraints on the machine this service is running on?

Asking for a location breakdown for year 2022 just resulted in:

Mar 23 '23 11:03 choldgraf

I have ~0 experience with matomo and this setup, but my debugging strategy would be to inspect logs of matomo which I think is running inside the k8s cluster on the GKE federation member. I'm not doing this right now, if you think I should @choldgraf let me know and I'll drop something else.

Mar 23 '23 11:03 consideRatio

I also have no idea how the Matomo service works (and I couldn't find it documented in our team compass but I did find this documentation in our binder SRE guide). I think it'd be helpful to add a "how to debug matomo" guide to those docs. Maybe @minrk knows how to do that?

Mar 23 '23 11:03 choldgraf

I've never debugged matomo because I've never encountered a bug with it before. We just turned it on and it's worked so far! I only got as far as looking at logs (stern matomo) to start trying to find errors when the errors stopped happening (I never did find an error, even scrolling back through what should have included the error, I think), but I stopped looking when the errors stopped happening. So my knowledge of debugging matomo can be summarised as: it's called matomo and deployed on prod, so that's where I can look for logs. The rest is learning as we go.

Mar 23 '23 12:03 minrk

So there aren't any errors in the logs even though those error UI elements are popping up? I am not really sure what to do then...it seems like "give me the visitors behavior over the last year" is a pretty standard query no?

Mar 24 '23 11:03 choldgraf

Note that their FAQ page about this error message has a suggestion for how to resolve it:

The most common solution to the message Oops... there was a problem during the request. is to setup Auto-Archiving in your Matomo by following these two steps:

a) Enabling the crontab to run the core:archive command and b) Disabling “Archive reports when viewed from the browser” in the General Settings.

I don't know how to do either of these things but perhaps this wouldn't be difficult to do?

Mar 24 '23 16:03 choldgraf

Another datapoint: Matomo is hanging for me on a much smaller query: "Entry page statistics in the last 1.5 months".

Apr 18 '23 12:04 choldgraf

There are some performance warnings on our mysql instance, so I'll try to look into those, they could be relevant.

Apr 20 '23 12:04 minrk

A related question to this: would plausible.io be more cost-effective and labor-effective?

Their pricing is based on page visits, and since mybinder.org basically bounces people as soon as they hit the page, that number should still be relatively low. In March, we had something like 190k visits. According to plausible.io's pricing page, traffic of that volume would be covered by ~$30 a month.

So for $30 month we would get the same quality of user tracking, in a privacy friendly way, without paying any cloud costs or incurring any maintenance costs. Does that make it a better-sounding proposition than hand-rolling our own Matomo instance?

edit: Apparently Matomo has their own paid plan as well. Could we just pay them the 20 euros a month and not have to worry about the Matomo aspect of our deployment at all? That feels pretty cheap to me.

Apr 20 '23 13:04 choldgraf

If we have some basic funds available to spend on this kind of thing, absolutely. In the past, it's been the difference between "we have hosting costs covered, but no funds we can just spend." that made self-hosted the available choice.

Apr 20 '23 13:04 minrk

If we do have anything in that pot from NumFocus, I suggest we use it on a hosted matomo subscription rather than running it ourselves. If we can use some of the GESIS funds to cover this cost, we could do that too. Does that make sense?

EDIT: I removed a bunch of the above because I had obviously not read your comment closely enough 😆

Apr 20 '23 16:04 choldgraf

The scientific python project already has an instance of plausible.io, maybe this can be a part of that?

Apr 20 '23 17:04 MridulS

I'd be happy with that, but I don't know how to utilize that instance. Do you know how? Maybe @stefanv has guidance?

Apr 20 '23 17:04 choldgraf

I do use it for NetworkX, but need to go through @stefanv for the initial setup.

Apr 20 '23 17:04 MridulS

Add the following to your HTML:

<script defer data-domain="mybinder.org" src="https://views.scientific-python.org/js/script.js"></script>

While we keep the service maintained to the best of our abilities, we do not have any backup in place; i.e., feel free to use it, but consider the data ephemeral.

Apr 20 '23 21:04 stefanv

team-compass team-compass copied to clipboard

Matomo not working for large queries on mybinder.org

team-compass
team-compass copied to clipboard