co-log [RFC] Structured logging

Structured logging is the standard way of doing logging in big modern applications. That's why it would be helpful to provide more utilities in co-log for doing structured logging. For big applications, it's more useful to log JSON values that can later be analyzed by various search engines. For example, instead of textual:

[Debug]   [03 May 2019 05:23:19.058 +00:00] [Main.example#34] [ThreadId 11] app: First message...

In structured logging, we'll have

{
  "timestamp": "03 May 2019 05:23:19.058 +00:00",
  "severity": "Debug",
  "callStack": "Main.example#34",
  "threadId": 11,
  "message": "app: First message..."
}

To support this workflow, I propose the following plan:

[ ] Introduce new module Colog.Structured in co-log
[ ] Add aeson to dependencies
[ ] Implement all LogActions that can print aeson.Value and any data type with ToJSON instance to stdout, stderr, handle and file
[ ] Implement logging function like logJson that will take JSON value as an argument
[ ] Implement machinery similar to RichMsg that allows extending JSON with arbitrary monadic actions and which says what JSON keys that will be assigned to
[ ] Write tutorial with an example

Pros

Richer interface
Support modern practices
More flexible interface: better abstractions to support different workflows

Cons

More dependencies
More code to test and maintain

Open questions

[ ] Severity will require ToJSON instance. We should either move it to the co-log package or implement an orphan instance
[ ] Think on the convenient interface for logging. How do we want to log values in Haskell? Keep old interface (simple, no repetition, not flexible enough):

logJson Debug "Example text"

or support arbitrary values (flexible, but a lot of boilerplate)

logJson
    [ "severity" .= Debug
    , "body" .= "Example text"
    ]

Feb 09 '20 13:02 chshersh

@qnikst I remember, you had some ideas regarding structured logging. Will you have time to share your thoughts? :slightly_smiling_face:

Feb 09 '20 13:02 chshersh

This will be really useful for Cloudwatch. Additionally, can a richMessageAction without colours be provided?

Feb 21 '20 02:02 ashutoshrishi

Hi @ashutoshrishi, thanks for your feedback! I've created a separate issue about supporting logging without colouring. I think it should be solved independently from this one.

Feb 21 '20 09:02 chshersh

Let me add my few cents here. So I've implemented structured logging based on co-log for my project, but the solution is specific for my needs and I'm not sure if can be put into the public library. I'll briefly go through the implementation and main structures, and may it will give a rise for the better ideas.

There are two big structured log packages in the Haskell ecosystem di and Katip. I'll be referring this packages in my post. (I've used highly modified version of Katip but I had a lot of things I didn't like, that I wanted to change for me).

The basic structure. Katip is using semi-structured Value, i.e. it has Aeson.Value for user data and some predefined fields in the message. I personally find most of those fields redundant (environment (prod, env), host, pid, etc.) all of those things can be added by the external logging system that will dump your logs to the logs-aggregator. Aeson.Value is highly inefficient, especially if you re-dump it over and over again. It's possible to get few times speedup by manually rewriting Katip serialisation using Aeson.Encoding. On the other hand using Aeson.Value gives additional flexibility and allows introducing Verbosity (how much of the structure do you put in the logs). I personally do not care about that in the server application I always dump everything (and all the log levels). So just using Aeson.Value is actually a worst case that you should not be worse than. di introduces it's own format and message df1. Using any format except for json seems unreasonable unless you want to reimplement all the log analysis stack. But df1 on it's own is a nice stack-like structure where you can push segments (namespaces in Katip terminology) and tags. It didn't fit my case as Df1.Value is basically Text and I wanted to store json objects.

So I ended up with implementing my own message type:

data Structured = Segment  T.Text | Attr T.Text Encoding

data Message = Message
  { attributes :: Seq Structured
  , thread_id :: ThreadId
  , severity :: Severity
  }

This message type contained only things I care about (even no time!), allowed to add additional data, is Encoding friendly and quite efficient. There are few problems with that structure though:

It's not extensible, I doubt everyone will like my choice of the common fields
It may not be the most efficient (I expect just changing Seq to [], will be me benefits, but I don't have benchmarks to prove that)
It may be not effective to support attributes shadowing.

On the other hand this structure may be improved to reuse previous message dumps.

Dumping data, I've introduced logAction that is based on ByteString:Data.ByteString.Builder.hPutBuilder $ Aeson.fromEncoding ... so I do not mess with additional logger logic and just store data. Such action may make sense, but in co-log framework it's better to split convention builder to ByteString and further processing.
Rest of the file I can interface that made my package totally compatible with previous Katip implementation. So I had to change 0 lines in 50k lines project. So in interface I had:

data LoggerEnv = LoggerEnv
  { action :: LogAction IO Message <- again types are not generalised
  , context :: Seq Structured
  }

addContext :: PushItem -> LoggerEnv -> LoggerEnv
addNamespace :: T.Text -> LoggerEnv -> LoggerEnv

sl :: ToJSON a => a -> PushItem -- katip compatibility
newtype PushItem = PushItem (Seq Structured -> Seq Structured)

And methods for pushing new segments and context to the logger env, that works well with Handle patter, also local-like methods works well, but they form stack and that may not be the best possible way. This allowed me to have nice, fast but wired to my package implementation.

I'll be interested in generalising and contributing my solution, but I don't see how to make it usable for the people. And as it was very short I'm not sure what to describe, so any questions about clarification are welcome.

Apr 07 '20 00:04 qnikst

With regards to concrete actions that can be done in oder to provide general support for the structured logging, I'd do the following:

Introduce new package co-log-structured with minimal set of dependencies, basically co-log-core, bytestring, aeson. That will be the core for structured logging. It may have (all names are drafts, I'm bad at naming): a. ToWire (basically a a -> Encoding function), this function is intended to tell how to store a message in the structured format without depending on the concrete message type. Some helpers that allow generating ToWrite for types that have ToJSON for free. b. Few efficient LogActions that could efficiently work with ToWire, I can suggest logHandle :: ToWrite a => Handle -> LogAction m a, logBuilder :: ToWire a => LogAction m Builder -> LogAction m a. That will allow build a logging pipeline. c. Message type that is efficient for structured logging like Message above (I'm not sure maybe it can be generalised, or maybe it worth just keeping wrapper over Seq Structured so it can be embedded in user type). d. Some API for pushing segments and attributes to the message.

In this form package will provide basic tools for structured logging and simple way to run structured logging for people who need nothing advanced.

After having such a message it's possible to use it in co-log by giving RichMessage ToWire instance.

Apr 07 '20 19:04 qnikst

@qnikst Thanks for sharing your experience and ideas with regards to structured logging. It is handy for us to design a flexible and extensible interface. It'll take some time to polish the implementation and resolve some remaining design decisions, but I have a clearer picture now of how to implement structured logging in co-log.

Apr 07 '20 20:04 chshersh

@chshersh, what do you think about OpenTelemetry? I guess logging and tracing subjects solving one problem - propagating trace identificators across microservices responsible for fulfilling requests. A typical application consists of dozen services actively interacting between themselves. So to do efficient incident investigation a logger aggregator should have all log messages labelled with request ids. Does co-log support tracing?

I noticed that co-log relays heavily on advanced typing and effects and it wouldn't be a problem to transfer trace id from WAI HTTP request interceptor to REST client interceptor through the effect state, thought I would like to hear your opinion about GHC feature which would make possible a magic hack to do the same with bare IO or lazy evaluation.

May 30 '20 20:05 yaitskov

@yaitskov That's an interesting idea. Logging and tracing are indeed similar in some sense. co-log is very flexible due to underlying concepts in its core, and supporting tracing shouldn't be a problem :ok_hand:

I haven't used OpenTelemetry before but integrating co-log with OpenTelemetry sounds like a fun project to work at :slightly_smiling_face: And if GHC can support the smoother implementations of tracing features in Haskell without losing much performance in general, that sounds very useful :+1:

Jun 02 '20 16:06 chshersh

how much effort is needed to get an initial version of it? I have to add a structured logger to one of my org's services, and I'd like to use co-log for it instead of katip, and it would be good to have something to start from and that could be improved over several iterations later. So far I only need a logger that can spit out json strings to stdout, nothing fancy.

Oct 06 '20 14:10 avanov

@avanov The amount of required effort is not that big if you want something simple :slightly_smiling_face: The only thing we need to finish this is more time. Unfortunately, free time and mental health are very limited resources. We still need to go to our jobs, and we are also pretty exhausted during this year Hacktoberfest... So I wouldn't expect to have a lot of time to work on this in the next month.

Oct 06 '20 22:10 chshersh

@avanov for what it worth I've managed to open source an implementation I've described above. It still need some polishing, documentation and introducing examples, and possible interface generalisation. So I hope to put in on hackage within this week.

I've used co-log-json name in order not to take over co-log-structured name or something like that. So it can be used by more "official" solution. Some time ago @chshersh posted on twitter ideas about API for structured logging with focus on simple API and good error messages. Such API may be more principal and nicer to use, and I hope to see it someday on package. Some points:

my package is not so extensible and sometimes conflict with the core package, for example you can't use logDebug and other functions form co-log directly
there are some strong choices that is not very easy to change, e.g. time is not added to messages, message structure is fixed (though it can be changed)
it provides a fast-path for logging, but it's not totally safe unless you use it with a safe wrapper (e.g. co-log-concurrent).
library provides a compatibility with a subset of Katip library, that happened because I had to migrate from Katip to my package and I didn't want to refactor 50k+ lines database. So API may be a suboptimal

Nov 04 '20 09:11 qnikst

Hi @qnikst, thanks for that! The repo is quite informative in itself and I appreciate the docstrings. I think I can take it as a PoC example and extend it further.

Nov 09 '20 16:11 avanov

@avanov, I've released a package https://hackage.haskell.org/package/co-log-json. Feel free to use and contribute to :)

Nov 11 '20 06:11 qnikst