feat: migrate logging `log` -> `tracing`
Lets split this up into two PRs
- one changing the interface, tests and dependencys
- one adding additional configuration options to martin AND documenting them
Alternatively, we can keep this PR open and I am going to get back to this sometime on the weekend
Resolves #836
we should continue using the RUST_LOG env var for all the configuration, with the possible addition of CLI and Config params
- [x]
RUST_LOG - [x] custom env variables
MARTIN_LOG_FORMAT,MARTIN_LOG_LEVEL, ... - [x] deduplicating code
- [x] CLI
- [x] Config
- [x] Writing docs (🦖)
So I ran the performance benchmarks.. Seems like there is a supprising performance difference..
Will investigate where this is coming from.. I have some theories, such as my terminal not being the best at printing formatted output and such => will test them and report back and update the table below
| experiment | |||
|---|---|---|---|
baseline/log + with terminal output via ssh |
|||
log + no terminal output |
|||
tracing + with terminal output via ssh |
|||
tracing + no terminal output |
|||
tracing+ with terminal output via ssh + MARTIN_LOG_FORMAT=json |
|||
tracing+ with terminal output via ssh + NO request logging via actix-tracing, but via actix-middleware + MARTIN_LOG_FORMAT=json |
|||
tracing+ with terminal output via ssh + request logging via actix-tracing, but NOT via actix-middleware + MARTIN_LOG_FORMAT=json |
|||
tracing + with terminal output via ssh + request logging via actix-tracing, but NOT via actix-middleware + MARTIN_LOG_FORMAT=json + interest cache in tracing_log |
|||
tracing + no terminal output + request logging via actix-tracing, but NOT via actix-middleware + interest cache in tracing_log |
Okay.. So I got performance up there again. Was a good call to do benchmarks 😅
I think there are severe performance issues with my ssh-terminal and rendering special characters => I am going to discount the initial and only consider MARTIN_LOG_FORMAT=json
I don't think on a server, this is reasonably a problem.
The rqs went down on some of these, but 170k -> 130k is likely not something that users will see in reality. Will look further into what is causing the slowdown there
Okay.. so apparently this gets logged..
https://docs.rs/tokio-postgres/latest/src/tokio_postgres/connection.rs.html#338
This is because we ask postgres do do so..
https://github.com/maplibre/martin/blob/31fc566d586a773d43a8ebb54bd3a0af89cf3519/tests/fixtures/functions/function_zxy_query.sql#L7
So apparently some of our dependencys log via tracing (switch on the trace level to see it.. that is a lot). There is the expected change from not having so much semi-const stuff.. I think the ~relatively small change is fine..
=> I think performance is not a bottleneck anymore. Will investigate configurability next
WOW! This is a very thorough job you did there! Thank you!
Ideally, we should continue using the RUST_LOG env var for all the configuration, with the possible addition of CLI and Config params. If we need more values like output format (vs output filtering), that may go into separate params too.
When doing performance, stdout + term + ssh is almost always a bad choice - there might be locking and syscall issues affecting it. Best to redirect it to /dev/null to avoid additional uncertainty, or perhaps dump to a file (but that might also be less than stable).
No clue where CI and my local device is different. Will investigate further in the last part of the week.
@nyurik how would you want me to split this up?
I don't think this is currently a reviewable amount..
How about martin-observability-utils and the rest? (or strictly on the crate boundary)
Good question, I wish i knew the best way to proceed. Thinking out loud:
- if perf is similar, and all existing functionality is preserved, we can introduce some "log format" param into the ecosystem
- the monitoring utility is clearly a separate PR, perhaps
martin-monitor/martin-watch/martin-log/martin-*-- my guess is that this should be a purely CLI tool for any kind of debugging, perhaps with a nice UI with Ratatui? The only uncertainty is what happens when multiple martin instances are running?
What do you think?
- the monitoring utility is clearly a separate PR
I think there is some misunderstanding.
I am talking about the code pulled from the three crates to configure tracing.
There is no TUI involved and adding that would involve quite a bit more code than I think we should add for now (bigger issues and stuff ^^).
About naming:
The name of the crate was to enable tracing/metrics to be outsourced there too and the utils part is because the other -utils crate exists.
Thinking about it:
In the other crates in this repo, tracing/metrics are not relevant as they are CLI tools.
This is only relevant for martin => extracting this to martin-observability-utils is likely a bad choice..
Likely martin-tracing is a better choice, or should I stick with martin-tracing-utils?
Above I was talking about making this PR reviewable by splitting it up into smaller PRs. Currently this is a +900 -300 lines chonker of a PR (even though a large part of this is metadata and trivial changes..). There is still a lot of implementation.
I will split this up into:
- one PR laying the basic groundwork (
martin-tracing/martin-tracing-utils) => looking over if the API/naming I came up with is okay.. - one PR integrating it on a basic level (rip out env-logger, change request logging, add CLI/cofig file option)
- one PR doing the rest (rip out log, ..) <-- adding docs pushed to here as this is would be nice to release soon-ish after documenting
Alternative would be I will split this up into:
- one PR laying the basic groundwork (
martin-tracing/martin-tracing-utils) - one PR integrating this into
martin/martin-cp - one PR integrating this into
mbtiles
there were some merge-issues and I think it is simplest to just redo the work I have done in this PR, if we get https://github.com/maplibre/martin/pull/1747 merged.