[RFC] Structured logging
Structured logging is the standard way of doing logging in big modern applications. That's why it would be helpful to provide more utilities in co-log for doing structured logging. For big applications, it's more useful to log JSON values that can later be analyzed by various search engines. For example, instead of textual:
[Debug] [03 May 2019 05:23:19.058 +00:00] [Main.example#34] [ThreadId 11] app: First message...
In structured logging, we'll have
{
"timestamp": "03 May 2019 05:23:19.058 +00:00",
"severity": "Debug",
"callStack": "Main.example#34",
"threadId": 11,
"message": "app: First message..."
}
To support this workflow, I propose the following plan:
- [ ] Introduce new module
Colog.Structuredinco-log - [ ] Add
aesonto dependencies - [ ] Implement all
LogActions that can printaeson.Valueand any data type withToJSONinstance to stdout, stderr, handle and file - [ ] Implement logging function like
logJsonthat will take JSON value as an argument - [ ] Implement machinery similar to
RichMsgthat allows extending JSON with arbitrary monadic actions and which says what JSON keys that will be assigned to - [ ] Write tutorial with an example
Pros
- Richer interface
- Support modern practices
- More flexible interface: better abstractions to support different workflows
Cons
- More dependencies
- More code to test and maintain
Open questions
- [ ]
Severitywill requireToJSONinstance. We should either move it to theco-logpackage or implement an orphan instance - [ ] Think on the convenient interface for logging. How do we want to log values in Haskell? Keep old interface (simple, no repetition, not flexible enough):
logJson Debug "Example text"
or support arbitrary values (flexible, but a lot of boilerplate)
logJson
[ "severity" .= Debug
, "body" .= "Example text"
]
@qnikst I remember, you had some ideas regarding structured logging. Will you have time to share your thoughts? :slightly_smiling_face:
This will be really useful for Cloudwatch. Additionally, can a richMessageAction without colours be provided?
Hi @ashutoshrishi, thanks for your feedback! I've created a separate issue about supporting logging without colouring. I think it should be solved independently from this one.
Let me add my few cents here. So I've implemented structured logging based on co-log for my project, but the solution is specific for my needs and I'm not sure if can be put into the public library. I'll briefly go through the implementation and main structures, and may it will give a rise for the better ideas.
There are two big structured log packages in the Haskell ecosystem di and Katip. I'll be referring this packages in my post. (I've used highly modified version of Katip but I had a lot of things I didn't like, that I wanted to change for me).
-
The basic structure. Katip is using semi-structured
Value, i.e. it hasAeson.Valuefor user data and some predefined fields in the message. I personally find most of those fields redundant (environment (prod, env), host, pid, etc.) all of those things can be added by the external logging system that will dump your logs to the logs-aggregator.Aeson.Valueis highly inefficient, especially if you re-dump it over and over again. It's possible to get few times speedup by manually rewritingKatipserialisation usingAeson.Encoding. On the other hand usingAeson.Valuegives additional flexibility and allows introducingVerbosity(how much of the structure do you put in the logs). I personally do not care about that in the server application I always dump everything (and all the log levels). So just usingAeson.Valueis actually a worst case that you should not be worse than.diintroduces it's own format and messagedf1. Using any format except for json seems unreasonable unless you want to reimplement all the log analysis stack. Butdf1on it's own is a nice stack-like structure where you can pushsegments(namespaces in Katip terminology) and tags. It didn't fit my case asDf1.Valueis basicallyTextand I wanted to store json objects.
So I ended up with implementing my own message type:
data Structured = Segment T.Text | Attr T.Text Encoding
data Message = Message
{ attributes :: Seq Structured
, thread_id :: ThreadId
, severity :: Severity
}
This message type contained only things I care about (even no time!), allowed to add additional data, is Encoding friendly and quite efficient. There are few problems with that structure though:
- It's not extensible, I doubt everyone will like my choice of the common fields
- It may not be the most efficient (I expect just changing Seq to [], will be me benefits, but I don't have benchmarks to prove that)
- It may be not effective to support attributes shadowing.
On the other hand this structure may be improved to reuse previous message dumps.
-
Dumping data, I've introduced logAction that is based on
ByteString:Data.ByteString.Builder.hPutBuilder $ Aeson.fromEncoding ...so I do not mess with additional logger logic and just store data. Such action may make sense, but in co-log framework it's better to split convention builder to ByteString and further processing. -
Rest of the file I can interface that made my package totally compatible with previous Katip implementation. So I had to change 0 lines in 50k lines project. So in interface I had:
data LoggerEnv = LoggerEnv
{ action :: LogAction IO Message <- again types are not generalised
, context :: Seq Structured
}
addContext :: PushItem -> LoggerEnv -> LoggerEnv
addNamespace :: T.Text -> LoggerEnv -> LoggerEnv
sl :: ToJSON a => a -> PushItem -- katip compatibility
newtype PushItem = PushItem (Seq Structured -> Seq Structured)
And methods for pushing new segments and context to the logger env, that works well with Handle patter, also local-like methods works well, but they form stack and that may not be the best possible way. This allowed me to have nice, fast but wired to my package implementation.
I'll be interested in generalising and contributing my solution, but I don't see how to make it usable for the people. And as it was very short I'm not sure what to describe, so any questions about clarification are welcome.
With regards to concrete actions that can be done in oder to provide general support for the structured logging, I'd do the following:
Introduce new package co-log-structured with minimal set of dependencies, basically co-log-core, bytestring, aeson. That will be the core for structured logging. It may have (all names are drafts, I'm bad at naming):
a. ToWire (basically a a -> Encoding function), this function is intended to tell how to store a message in the structured format without depending on the concrete message type. Some helpers that allow generating ToWrite for types that have ToJSON for free.
b. Few efficient LogActions that could efficiently work with ToWire, I can suggest logHandle :: ToWrite a => Handle -> LogAction m a, logBuilder :: ToWire a => LogAction m Builder -> LogAction m a. That will allow build a logging pipeline.
c. Message type that is efficient for structured logging like Message above (I'm not sure maybe it can be generalised, or maybe it worth just keeping wrapper over Seq Structured so it can be embedded in user type).
d. Some API for pushing segments and attributes to the message.
In this form package will provide basic tools for structured logging and simple way to run structured logging for people who need nothing advanced.
After having such a message it's possible to use it in co-log by giving RichMessage ToWire instance.
@qnikst Thanks for sharing your experience and ideas with regards to structured logging. It is handy for us to design a flexible and extensible interface. It'll take some time to polish the implementation and resolve some remaining design decisions, but I have a clearer picture now of how to implement structured logging in co-log.
@chshersh, what do you think about OpenTelemetry? I guess logging and tracing subjects solving one problem - propagating trace identificators across microservices responsible for fulfilling requests. A typical application consists of dozen services actively interacting between themselves. So to do efficient incident investigation a logger aggregator should have all log messages labelled with request ids. Does co-log support tracing?
I noticed that co-log relays heavily on advanced typing and effects and it wouldn't be a problem to transfer trace id from WAI HTTP request interceptor to REST client interceptor through the effect state, thought I would like to hear your opinion about GHC feature which would make possible a magic hack to do the same with bare IO or lazy evaluation.
@yaitskov That's an interesting idea. Logging and tracing are indeed similar in some sense. co-log is very flexible due to underlying concepts in its core, and supporting tracing shouldn't be a problem :ok_hand:
I haven't used OpenTelemetry before but integrating co-log with OpenTelemetry sounds like a fun project to work at :slightly_smiling_face: And if GHC can support the smoother implementations of tracing features in Haskell without losing much performance in general, that sounds very useful :+1:
how much effort is needed to get an initial version of it? I have to add a structured logger to one of my org's services, and I'd like to use co-log for it instead of katip, and it would be good to have something to start from and that could be improved over several iterations later. So far I only need a logger that can spit out json strings to stdout, nothing fancy.
@avanov The amount of required effort is not that big if you want something simple :slightly_smiling_face: The only thing we need to finish this is more time. Unfortunately, free time and mental health are very limited resources. We still need to go to our jobs, and we are also pretty exhausted during this year Hacktoberfest... So I wouldn't expect to have a lot of time to work on this in the next month.
@avanov for what it worth I've managed to open source an implementation I've described above. It still need some polishing, documentation and introducing examples, and possible interface generalisation. So I hope to put in on hackage within this week.
I've used co-log-json name in order not to take over co-log-structured name or something like that. So it can be used by more "official" solution. Some time ago @chshersh posted on twitter ideas about API for structured logging with focus on simple API and good error messages. Such API may be more principal and nicer to use, and I hope to see it someday on package. Some points:
- my package is not so extensible and sometimes conflict with the core package, for example you can't use
logDebugand other functions form co-log directly - there are some strong choices that is not very easy to change, e.g. time is not added to messages, message structure is fixed (though it can be changed)
- it provides a fast-path for logging, but it's not totally safe unless you use it with a safe wrapper (e.g. co-log-concurrent).
- library provides a compatibility with a subset of Katip library, that happened because I had to migrate from Katip to my package and I didn't want to refactor 50k+ lines database. So API may be a suboptimal
Hi @qnikst, thanks for that! The repo is quite informative in itself and I appreciate the docstrings. I think I can take it as a PoC example and extend it further.
@avanov, I've released a package https://hackage.haskell.org/package/co-log-json. Feel free to use and contribute to :)