feat(output): implement JSON output
Implement --yaml switch for YAML format output, which allows using yq or nushell to interact with the result.
Actually I was initially working on #1765, but could not reach to a perfect state. JSON afraids trailing commas, making it not so easy to statelessly stream the result without a buffer. Considering a large number of potiential results, it would be memory-consuming to store all lines and print them at last.
On the other side, the YAML format, while infamous for its complexity on parsing, is friendly for streaming output. No need for serializing or extra dependencies, the simple and fast write! is all you need.
There are tools like yq that work just like beloved jq, and nushell supports YAML as well, so I suggest this PR might be able to close #1765.
I think i would really prefer json output. There are more tools that would support that as input.
As for the streaming problem, I think there are a couple of ways to handle that:
- Write the comma at the beginning of each entry instead of the end
- Use newline delimited json (ndjson) instead of a json array. Tools like jq can still parse this.
- Just don't write the last item until we either have the next one, or have reached the end, so we know if we need a comma or not
I think i would really prefer json output. There are more tools that would support that as input.
Yeah definitely, but the original functions in output.rs were stateless, so I have to refactor it to make it stateful.
The good news is, fortunately, that we gain a little bit performance improvement after refactoring, maybe because of less param passing (I guess)?
Just want to throw out a reference to libxo as my favourite way for command line programs to support structured output formats. Not sure there's something like that in the Rust ecosystem.
It would be useful if there was a distinction between fd --json invocations that do stat calls vs ones that don't. Because for large directories, the stat calls are what dominates runtime, and fd can run e.g. 10x faster without them.
Compare for example
strace -f --summary-only fd -I >/dev/null
with
strace -f --summary-only fd -I -S +0B >/dev/null
The second call is much slower, because the -S forces fd to use stat to find file sizes, without it does not.
For JSON output, there's no reason to not output all the info we have - but we don't have the stat info by default.
For example the param could be called basic --json=basic and --json=stat.
There isn't much we can output without a full stat call. Just the filename, the file type, and the inode number (on unix).
@sharkdp since this is a pretty significant change, I'd like to confirm you are ok with this change before merging it.
I'd like to confirm you are ok with this change before merging it.
Thank you for asking. This looks like a great feature to have! And thank you for your contribution, @Dustin-Jiang!
While I'm here… the format that we introduce here is something that users will depend on, so it is worth investing some time to come up with a first version of this format that we can hopefully depend on for a long time:
- In particular, I would like us to consider using the same format as ripgrep for paths (see this comment by BurntSushi).
- Also, I would really like to see a unit being included in the size-field (
size_bytes). - I was also wondering if
"mode"should be a string? It's not supposed to be read as a decimal number, so serializing it as such feels wrong(?)
I was also wondering if "mode" should be a string? It's not supposed to be read as a decimal number, so serializing it as such feels wrong
I'm kind of split on this. Decimal is a pretty bad format for human consumption. But if this json is then consumed programatically, it is probably more convenient to have it as a number than a string.
Perhaps using the octal encoding as a string could be a good middle graound?
@Dustin-Jiang, if you want, I could help make those changes to this PR.
I was also wondering if "mode" should be a string? It's not supposed to be read as a decimal number, so serializing it as such feels wrong
I'm kind of split on this. Decimal is a pretty bad format for human consumption. But if this json is then consumed programatically, it is probably more convenient to have it as a number than a string.
Yes, exactly. The problem with human consumption is that, even if unlikely, it is technically possible for a mode to be ----rwxrwx, and the current format would serialize this as "mode": 77, which is confusing/ambiguous. "mode": "077" would be pretty clear.
I agree that we should probably optimize for programmatic consumption, though. But even then, the string seems more practical to me? Because I will probably want to split the number by owner/group/others, and that is much easier if it's already in this string notation. Converting a single digit from a character to an integer should be an operation that is easily available in every programming language.
It gets more tricky actually if we consider stricky/setgid/setuid where the mode can be something like 4777. That suggests that we should always serialize it as a four-character string, possibly with a leading zero?
Perhaps using the octal encoding as a string could be a good middle graound?
How is that different from what I am proposing? Something like "0644" or "4777" would be the octal encoding as a (fixed length) string, right? Or would you suggest "0o0644" / "0o4777"?
How is that different from what I am proposing?
Sorry, I meant as opposed to something like "rwxr-x---", which would probably be the most uset friendly but not very computer friendly in most cases.