rfcs
rfcs copied to clipboard
RFC: structured logs
From the RFC summary:
This RFC proposes the addition of a new optional environment variable recognized by Yarn to allow users to specify the path of one or more structured logs, where Yarn can pipe its JSON-formatted output to. This allows automation systems such as continuous integration servers to monitor structured output of Yarn without affecting the usual human-readable unstructured output.
Sounds interesting; or maybe we could simply print all the JSON output on stderr, so that you'd have the choice on what to do with it (like piping it to a file in most cases). That would solve the main issue I would have with this: what happens when multiple processes run concurrently?
That being said, note that the logs in the v2 will be much more normalized than they currently are. They currently look like this:
▸ [BR0001] The human readable text for an exception
So in this context maybe a very small parsing would be more than enough for a diagnostic use-case? The BR0001
in this case would change depending on the actual reason for the exception.
@arcanis Thanks for the feedback!
what happens when multiple processes run concurrently?
I’m not seeing it yet — can you say more?
I’m not seeing it yet — can you say more?
Imagine having two processes writing simultaneously into the same file. You need to have some kind of mutex, or the writes will be interlaced and will likely corrupt the file.
We already have this kind of problem with the cache, but I'm working to fix that so I would prefer not to add another reason for mutexing :p
The main challenge with using stderr, requiring flags, or parsing (for our use case) is that the CI system and the CI scripts need to be separately developed and owned. So configuring Yarn to produce data for the CI system should be more or less transparent to the developer of the CI script that calls Yarn.
(As a heuristic, imagine that I’m working for Travis-CI and want to be able to detect and monitor infrastructure errors without requiring customers to modify their .travis.yml
.)
My concern with parsing output is that it’s hard to distinguish between output that came from Yarn itself vs output that came from user scripts (eg when users run their npm scripts via yarn foo
).
Imagine having two processes writing simultaneously into the same file.
Ah, I see. For example in my case, even if each CI execution gets its own path, there’s nothing preventing the user from calling Yarn in parallel processes within their CI script. Great point. I will mull this over :)
@arcanis One quick question: does Yarn ever spin up multiple parallel processes of its own, or are you only talking about people invoking Yarn itself in parallel?
Yarn doesn't (for now, #109 could potentially change that when multiple tasks run concurrently), but some build tools (like Buck / Bazel) do spawn multiple concurrent instances.
@arcanis Thanks again for engaging me on this! Thinking this through, here's where my thought process has been taking me:
- It seems plausible to me that atomicity of logging could be the user's responsibility. In situations where users want to spawn concurrent instances of Yarn, the usual approach would be to give separate logs to separate invocations of Yarn. They can do this manually (by running separate invocations of Yarn with distinct logs) or automatically (with a wrapper Yarn executable that atomically chooses distinct logs). In my case I'm thinking we'd lean towards the wrapper approach.
- The feature of outputting structured logs to stderr sounds fine on its own merits, but just doesn't work for my use case: it mixes the structured log data with the output of the command itself, which could be to run an npm script that can generate arbitrary stderr output. So while that idea is a good one on its own merits, it's really an orthogonal feature to the use cases this RFC is trying to address.
- It probably makes sense to have some amount of atomicity guaranteed by Yarn—say, a guarantee that individual lines of output won't be interleaved—but I get the impression you're not enthusiastic about implementing that. FWIW, I'm happy to help with implementation, either personally or with one of my colleagues.
Thoughts?
Maybe it would be possible to ignore the mutex problem and assume that the log target would be made unique somehow (we also could provide some kind of pattern replacement, like $$
would become the pid etc - but that's a bit more complex an potentially unexpected).
I plan to release the v2 trunk somewhere next week, potentially thursday or friday - it contains a rework of the reporters, so what do you think of waiting until then to discuss the specific of a possible implementation?
Sure! I'm not in a huge rush, just really grateful for your collaboration on this. :D Ping me on this thread whenever you're ready to pick it back up. :+1: