goatcounter icon indicating copy to clipboard operation
goatcounter copied to clipboard

Current master requires regular restarts in order for stats to update

Open jbg opened this issue 3 years ago • 9 comments

I am trying out goatcounter using latest master built like this:

CGO_ENABLED=0 go build \
  -tags osusergo,netgo,sqlite_omit_load_extension \
  -ldflags="-X zgo.at/goatcounter.Version=$(git log -n1 --format='%h_%cI') -extldflags=-static" \
  ./cmd/goatcounter

And running it with:

PGPASSWORD=[REDACTED] goatcounter serve -listen '*:8080' -tls http -db postgres://goatcounter@[REDACTED]/goatcounter -automigrate

It is behind a proxy that terminates TLS.

Everything appears to work fine at first; /count.js and /count are served and the dashboard is served, but most information doesn't appear in the dashboard until the process is terminated and restarted. Some pages do start to appear in the list under Pages, but with pageview counts of 0 and without corresponding bars in the graph. No country, browser or screen size information is shown.

Within a few seconds of terminating goatcounter and restarting it, all of the pageview counts, countries, browsers and screen sizes appear, but after it is restarted, new page views are not processed for this information (though they do appear to be counted in the total, because the percentages of the existing information reduce over time).

jbg avatar May 04 '21 06:05 jbg

Can you run it with -debug=all, send a pageview, wait at least 10 seconds for the background process to run, and send me the output of that? This should at least determine if that's running correct.

It's run on shutdown, so that might explain why you see it after restarting.

arp242 avatar May 04 '21 06:05 arp242

07:18:14  200 GET     0ms  stats.redacted/count.js
May  4 07:18:20 cron: DEBUG: PersistAndStat started
  cron              2247ms  memstore
May  4 07:18:30 ERROR: cron task PersistAndStat is taking longer than 10s
  cron              7826ms  stats
May  4 07:18:30 cron: DEBUG: persisted 1 hits {memstore="2247ms" stats="7826ms"}

The only thing that appears unusual is the length of time taken. It's pointing at a postgresql server with virtually zero load and other queries on the same server have normal performance.

jbg avatar May 04 '21 07:05 jbg

Before sending the hit, I see cron: DEBUG: PersistAndStat started every 10 seconds, but afterward, I never see that log entry again.

jbg avatar May 04 '21 07:05 jbg

Before sending the hit, I see cron: DEBUG: PersistAndStat started every 10 seconds, but afterward, I never see that log entry again.

Yeah, that's expected; as there are no pageviews to process.

Not sure why it's taking so long, for comparison this is what I have for goatcounter.com:

May  4 07:29:08 cron: DEBUG: persisted 21 hits {memstore="21ms" stats="91ms"}

Even on very limited hardware, this should be quite fast.

It sounds like a problem with your PostgreSQL server, to check you can try running the SQLite version as a test, and see if it takes that long as well. That would confirm it's a problem with the database rather than the application itself.

To see why it's taking so long in PostgreSQL, you can try enabling auto_explain, e.g. something like this in your postgresql.conf:

session_preload_libraries     = 'auto_explain'
auto_explain.log_min_duration = '500ms'

and then check/post the logs.

arp242 avatar May 04 '21 07:05 arp242

Yeah, that's expected; as there are no pageviews to process.

Is it expected that I see these logs every 10 seconds after restarting (before any hits are received at all), and then after one hit, the above ERROR: cron task PersistAndStat is taking longer than 10s is logged, and the cron: DEBUG: PersistAndStat started messages cease (even if more hits are received)?

I already have log_min_duration set to 100ms. These queries are not logged. If I log all statements, I see that these queries are taking ~10ms.

jbg avatar May 04 '21 07:05 jbg

Wait, I misread your comment, I thought you meant that the messages after PersistAndStat started were never seen again (cron 2247ms memstore etc.), but you meant that this entire message is never seen again.

I'm not sure what's taking so long if it's not the database 🤔 Neither function (the memstore and stats) has a lot of complex logic, and because both functions take this long I still suspect it's a problem with the database, somehow, in spite of no queries showing up in the auto_explain. There is very little (if any) shared code between these functions, and both having a performance bug is possible, but not very likely.

Could you try the SQLite version just to confirm?

arp242 avatar May 04 '21 07:05 arp242

Wait, I misread your comment, I thought you meant that the messages after PersistAndStat started were never seen again (cron 2247ms memstore etc.), but you meant that this entire message is never seen again.

Correct, it's as if the cron task stops running completely. Even if there is some performance problem with the PG server (which is unlikely; it's in use for other things and we would have noticed if queries were taking multiple seconds), the fact that the cron task stops working seems like a bug (and explains why I have to restart it to get stats).

I'll try out the SQLite version and see what happens.

jbg avatar May 04 '21 07:05 jbg

Yeah, that makes sense. I'm not entirely sure off-hand, no one reported a problem like this before, and master seems to be working fine for me 🤔 It could be something specific to the way GoatCounter interacts with the database or some such.

One thing I've been intending to do for a long while is make this entire process a lot less opaque; debugging it right now consists of "add printf, recompile, see what happens" which is far from ideal. I have a few ideas and "experimental patches", but there are some things I still need to figure out. I'll try and work on that some more this week.

arp242 avatar May 04 '21 08:05 arp242

I just noticed a similar issue on a fresh goatcounter v2.0.4 (arm64) installation with a sqlite database. The pages and referrers are appearing correctly on the dashboard, but browsers, systems, screen size, and locations are empty. This problem persists even through a restart. I enabled debug logs, but nothing seemed abnormal. The database contains the pageviews and stats. I tried manually running the queries in the db folder, and the output is correct.

I will try to debug this further when I have more time.


Edit: I just tried something on a hunch based on the URL. I think my issue is unrelated to this. It appears that GoatCounter doesn't treat date ranges correctly for all charts. For example, if I go to https://kfw.gc.geek1011.net/?hl-period=month&period-start=2021-05-20&period-end=2021-06-20&filter=&as-text=off&daily=off, the aforementioned charts will be blank. But, if I change the end date to 2021-06-21 in the URL (the date selector rejects the change), the stats appear correctly.

This might be related to #431.


Edit: It seems to work fine now that it's been running for a while. I don't think it's worth spending the time to fix the issue I ran into since there'll probably be a risk of bigger regressions, and it only partially affects the dashboard until the first day is over.

pgaskin avatar Jun 21 '21 02:06 pgaskin

Should be fixed in the 2.3 release.

arp242 avatar Oct 17 '22 13:10 arp242