git-bundle-server
git-bundle-server copied to clipboard
Add logging & tracing to the application
Provide tracing & logging options to provide users more visibility into the operation of the bundle server. This information can be useful when debugging, monitoring operation of the system, etc. To keep things simple (and privacy-conscious), the only options for logging will be "to a file" or "to stdout".
Specification
The two best options (AFAICT) seem to be OpenTelemetry tracing or trace2, output as structured JSON.
| Option | Pros | Cons |
|---|---|---|
| OTel | - Widely-used - Existing SDK |
- Spans only written when finished |
| trace2 | - Real-time output - Simple API |
- Lower adoption (only Git and Git Credential Manager?) - Fields don't fit well with web server |
Ultimately, the goal for the bundle server would be to support both conventions; with that in mind, the logger interface should be general enough to eventually accommodate both.
Log exporting
Even with a specification, we still need to be able to export the log data to a file. There are a number of structured loggers in Go, but the performance benefits and configurability of zap make it the preferred choice for this initial implementation.
Rejected
- Standard library: https://pkg.go.dev/log
- Google Logger: https://github.com/google/logger
Due to the lack of OTel logging support in Go, going with trace2 for the time being.
For future reference, JSON-ified OTel structures: https://opentelemetry.io/docs/reference/specification/protocol/file-exporter/#examples
From experience with a similar-ish product before, here are some likely scenarios that customers will want to cover with observability. I don't think we should do a lot of number crunching on our end. We should emit sufficient events and details to make something like DataDog or Azure Data Explorer useful. We don't have to tackle everything here. I lack a strong signal on relative priorities, so maybe we build the ones which are easiest?
- Usage stats
- Fetches/clones by repo and by user
- Locate spikes in workload, ideally attributable back to an identity
- Internal service health (web server, sync engine)
- Repo state (last bundle time, available bundles, last fetch from upstream)
Speaking with an early adopter corroborated that observability (and potentially auditing) are at least as important as debugging/troubleshooting. The very first thing they want to know is, "who cloned which repos when?" (Answering that question may be complex in the face of pluggable authentication. At a minimum we should have client IP address, and perhaps the auth plugins can be taught how to route additional details to us for logging.)