goatcounter
goatcounter copied to clipboard
Implement Caddy JSON log parser
This should be based on top of the split-regexparser branch but I'm not entirely sure how to do that.
I understand that this does not allow for custom formats, only the base/default Caddy formatting. I'm not sure if a custom query language is desired or not.
I understand that this does not allow for custom formats, only the base/default Caddy formatting. I'm not sure if a custom query language is desired or not.
Do you know of any other software that uses JSON for logs?
In principle this could be done by scanning to a map[string]any, and then you can map JSON fields to things to import with something like --format=json:..
But if nothing else uses JSON then that's probably not worth it.
I only know of Traefik but have not used it.
Okay, it'll be fine to just add a Caddy-specific method. I don't really feel like coming up and implementing a generic approach, especially when it's unclear that's needed. Looking at goaccess, it doesn't support JSON at all, and when people ask for it it's mostly for Caddy. A more generic approach can always be added later if it's needed – just accepting the "caddy" format doesn't prevent that.
I merged your other PR – if you rebase on master I'll have a detailed look at this one.
The test file only has a single request – it should have a few requests, to test that reading more than one request works.
I also wonder if/how goatcounter import -follow will work with Caddy? If Caddy outputs one JSON object per line then that wouldn't be a problem, but if it outputs human-readable JSON per your test example then getting that to work would require some additional hackery.
Caddy does indeed print one request per line, so that wouldn't be necessary. I've been using this branch with my Caddy instance since last week :grin:
I've rebased from master and will add a test in a bit
Right so – would be good to have logfiles as Caddy outputs them. Manually twiddling compacted JSON is a bit of a pain, but otherwise you're not really testing the real-world scenario.
My use case is also caddy. What can unblock this PR?
I think it's basically fine; IIRC I just wanted to write a bit of docs, but never really got around it. Basically, I just forgot 😅 I'll try and find some time for things this week or so.
I forgot about this PR 😬 I have some time this week, let me know if there's anything you'd like me to do here
The only open comment is: https://github.com/arp242/goatcounter/pull/730#discussion_r1556269320
I do remember now: what I wanted to do (but didn't) was to run Caddy and play around with possible options and see what does and doesn't work. I never really used Caddy and don't really have a good insight on how "complete" this PR is and what options Caddy does or doesn't have.
I am having some problems with this PR:
- Importing the same log file repeatedly will cause repeated views after some hours (waiting hours/overnight is necessary for the issue to manifest). I assume this is due to the pageview/visit calculation, but it's not really useful for me, as I run Goatcounter on a separate server
- "Sizes" remains "100% Unknown": is this supposed to be parsed from user-agent?
- "Locations" is "99.99% Unknown", but there are some results; is this supposed to be parsed from user-agent?
"Sizes" remains "100% Unknown": is this supposed to be parsed from user-agent?
This can only be fetched from JS; it's just not sent.
"Locations" is "99.99% Unknown", but there are some results; is this supposed to be parsed from user-agent?
It's from the IP address. I'm not sure why it's mostly "unknown"; are you sure the user's IP is used and not some proxy's?
How do you run a Caddy static file server with JSON logging? This is also what I ran in to last year when I wanted to test some stuff and can't get it to work today either.
Allegedly this should be something like:
localhost:2019 {
file_server browse
log access-log {
format json
}
}
And then just caddy run.
Tried many different variants of that, example, etc. But nothing seems to work. The file server doesn't work and log isn't in JSON.
Okay, I solved it: listing on localhost:2019 doesn't work as that's the admin port; it just silently fails. Do'h. Just using another port works. Fuck me that too me forever to figure out 🙃
Alright, fixed up a few small things and seems good now. Sorry it took such a long time to get it merged.
Note I renamed some of the -datetime fields to be more consistent with some other fields someone added last week (these now also work with the regexp parser, rather than just the Caddy one).