goatcounter icon indicating copy to clipboard operation
goatcounter copied to clipboard

Implement Caddy JSON log parser

Open DavidVentura opened this issue 1 year ago • 11 comments

This should be based on top of the split-regexparser branch but I'm not entirely sure how to do that.

I understand that this does not allow for custom formats, only the base/default Caddy formatting. I'm not sure if a custom query language is desired or not.

DavidVentura avatar Mar 30 '24 14:03 DavidVentura

I understand that this does not allow for custom formats, only the base/default Caddy formatting. I'm not sure if a custom query language is desired or not.

Do you know of any other software that uses JSON for logs?

In principle this could be done by scanning to a map[string]any, and then you can map JSON fields to things to import with something like --format=json:..

But if nothing else uses JSON then that's probably not worth it.

arp242 avatar Mar 31 '24 03:03 arp242

I only know of Traefik but have not used it.

DavidVentura avatar Mar 31 '24 09:03 DavidVentura

Okay, it'll be fine to just add a Caddy-specific method. I don't really feel like coming up and implementing a generic approach, especially when it's unclear that's needed. Looking at goaccess, it doesn't support JSON at all, and when people ask for it it's mostly for Caddy. A more generic approach can always be added later if it's needed – just accepting the "caddy" format doesn't prevent that.

arp242 avatar Apr 08 '24 18:04 arp242

I merged your other PR – if you rebase on master I'll have a detailed look at this one.

arp242 avatar Apr 08 '24 18:04 arp242

The test file only has a single request – it should have a few requests, to test that reading more than one request works.

I also wonder if/how goatcounter import -follow will work with Caddy? If Caddy outputs one JSON object per line then that wouldn't be a problem, but if it outputs human-readable JSON per your test example then getting that to work would require some additional hackery.

arp242 avatar Apr 08 '24 18:04 arp242

Caddy does indeed print one request per line, so that wouldn't be necessary. I've been using this branch with my Caddy instance since last week :grin:

I've rebased from master and will add a test in a bit

DavidVentura avatar Apr 08 '24 18:04 DavidVentura

Right so – would be good to have logfiles as Caddy outputs them. Manually twiddling compacted JSON is a bit of a pain, but otherwise you're not really testing the real-world scenario.

arp242 avatar Apr 08 '24 19:04 arp242

My use case is also caddy. What can unblock this PR?

maruel avatar Sep 30 '24 13:09 maruel

I think it's basically fine; IIRC I just wanted to write a bit of docs, but never really got around it. Basically, I just forgot 😅 I'll try and find some time for things this week or so.

arp242 avatar Sep 30 '24 14:09 arp242

I forgot about this PR 😬 I have some time this week, let me know if there's anything you'd like me to do here

DavidVentura avatar Sep 30 '24 14:09 DavidVentura

The only open comment is: https://github.com/arp242/goatcounter/pull/730#discussion_r1556269320

I do remember now: what I wanted to do (but didn't) was to run Caddy and play around with possible options and see what does and doesn't work. I never really used Caddy and don't really have a good insight on how "complete" this PR is and what options Caddy does or doesn't have.

arp242 avatar Sep 30 '24 14:09 arp242

I am having some problems with this PR:

  1. Importing the same log file repeatedly will cause repeated views after some hours (waiting hours/overnight is necessary for the issue to manifest). I assume this is due to the pageview/visit calculation, but it's not really useful for me, as I run Goatcounter on a separate server
  2. "Sizes" remains "100% Unknown": is this supposed to be parsed from user-agent?
  3. "Locations" is "99.99% Unknown", but there are some results; is this supposed to be parsed from user-agent?

DavidVentura avatar Jan 01 '25 12:01 DavidVentura

"Sizes" remains "100% Unknown": is this supposed to be parsed from user-agent?

This can only be fetched from JS; it's just not sent.

"Locations" is "99.99% Unknown", but there are some results; is this supposed to be parsed from user-agent?

It's from the IP address. I'm not sure why it's mostly "unknown"; are you sure the user's IP is used and not some proxy's?

arp242 avatar Mar 19 '25 04:03 arp242

How do you run a Caddy static file server with JSON logging? This is also what I ran in to last year when I wanted to test some stuff and can't get it to work today either.

Allegedly this should be something like:

localhost:2019 {
    file_server browse
    log access-log {
        format json
    }
}

And then just caddy run.

Tried many different variants of that, example, etc. But nothing seems to work. The file server doesn't work and log isn't in JSON.

arp242 avatar Jun 02 '25 17:06 arp242

Okay, I solved it: listing on localhost:2019 doesn't work as that's the admin port; it just silently fails. Do'h. Just using another port works. Fuck me that too me forever to figure out 🙃

arp242 avatar Jun 08 '25 16:06 arp242

Alright, fixed up a few small things and seems good now. Sorry it took such a long time to get it merged.

Note I renamed some of the -datetime fields to be more consistent with some other fields someone added last week (these now also work with the regexp parser, rather than just the Caddy one).

arp242 avatar Jun 08 '25 18:06 arp242