exa icon indicating copy to clipboard operation
exa copied to clipboard

Feature request: provide structured output

Open dbohdan opened this issue 9 years ago • 13 comments

Parsing ls is an ages-old problem that exa could help solve by providing an option for producing structured key-value output. By "structured key-value output" I mean one or more of JSON (pretty much the default nowadays), YAML, XML (boo!), TOML (yay!) or Tcl dictionaries (sample).

dbohdan avatar Feb 22 '15 12:02 dbohdan

Do you have a use case for why you'd want this sort of output? To write another front-end for exa, perhaps? Parsing ls is annoying, but exa isn't guaranteed to be installed, so you couldn't use it for scripts. There might be something I'm missing here.

ogham avatar Feb 24 '15 12:02 ogham

The overall use case I have in mind is to avoid the typical problems of handling files with odd filenames from the shell (is it find -print0 or find -0, again?) and other complex text parsing. The best way to accomplish this that I can think of is through making a suite of replacements for the standard POSIX tools that accepts and produces structured data in a common serialization format. Until something like that is reasonably complete the usefulness of this feature will be admittedly limited. The upside with structured data is you don't necessarily need to have find that is distinct from ls (except to terminate search early for performance reasons) and there already exists a replacement for sed and awk for JSON in jq.

Selecting groups of files from directory trees with jq is one thing that could be immediately useful. I have not tried it extensively but it looks like an interesting alternative to doing the same with find.

Frankly, I hadn't thought of the front end idea.

exa isn't guaranteed to be installed, so you couldn't use it for scripts

In many scenarios you could deploy exa with your scripts. Can it be statically linked?

dbohdan avatar Feb 24 '15 13:02 dbohdan

is it find -print0 or find -0, again?

Ah yes, this case. I've been bitten by filenames with spaces in before, and know about the 0 fix, but never actually remember how to do it.

I designed exa to be user-facing, hence the colours, styles, and alignment, rather than a tool designed to have its output piped somewhere else. The fields it produces are available through other tools - albeit it would be a lot trickier to write a script that uses these tools if you have to use more than one. Are you thinking about doing, for example, "Find me all files that are modified by Git, that have been created in the past month" by using exa's Git and Date columns then piping that to jq?

Can it be statically linked?

Potentially - I'm not sure what exa's installation story is going to be, but Rust does allow static linking.

ogham avatar Feb 24 '15 13:02 ogham

Ah yes, this case. I've been bitten by filenames with spaces in before, and know about the 0 fix, but never actually remember how to do it.

Spaces are bad enough but it gets way worse with newlines, leading dashes and other characters, and more so if you want to avoid the GNU extensions.

Are you thinking about doing, for example, "Find me all files that are modified by Git, that have been created in the past month" by using exa's Git and Date columns then piping that to jq?

Pretty much.

dbohdan avatar Feb 24 '15 14:02 dbohdan

I designed exa to be user-facing, hence the colours, styles, and alignment, rather than a tool designed to have its output piped somewhere else.

It's an old issue but I agree with the sentiment that having a cross-platform way to get structured directory output would be great. Even if it wasn't the first goal of your project, I believe that adding it to your project makes more sense than using another tool just for this.

demurgos avatar Aug 04 '17 08:08 demurgos

The serialized output should include also the colours of the entries. This is useful e.g. when writing Elvish shell completions (using the &style option of edit:complex-candidate), and the completions should have the same colouring of filenames as when listed with exa, but it requires additional filtering, that the builtin filename completion can't do (e.g. only git untracked/tracked/new/etc. files, or character/block devices, or that are modified last week, etc.).

notramo avatar Jan 06 '18 14:01 notramo

Use fselect for this, you can output to json, csv or html. https://github.com/jhspetersson/fselect#:~:text=Format%20output:

fselect size, path from /home/user limit 5 into json
fselect size, path from /home/user limit 5 into csv
fselect size, path from /home/user limit 5 into html

Dialga avatar Feb 25 '21 00:02 Dialga

Such "modern" tool must have -print0 and -printf keys like venerable find does.

Alukardd avatar Aug 21 '21 11:08 Alukardd

I think every program that prints tabular data (like ls, ps etc) should have a --json option to print json-encoded lines. Because parsing tabular data requires loading the entire table / output into an RAM (to check spaces and column boundaries), and that's kinda against the point of having pipes (Unix pipes are streams, not tables). While parsing a json-encoded line (row of table) can be done independently from other lines/rows. This has 2 benefits:

  1. Parsing, filtering/processing then printing very large tables will not require loading it into RAM.
  2. In case the entire output is not printed immediately, for example something like find command (or any long-running program), we don't have to wait for the the entire output/table to start processing it. Again, every json-encoded line can be parsed and processed independently (unlike tabular).

CSV can do the same thing. but JSON is easier to work with (can pipe with tools like jq) and has more types (like numbers and boolean).

ilius avatar Sep 11 '22 21:09 ilius

Even by loading the entire tabular data into RAM, we may not be able to correctly recognize columns. ISO date time is the best example I have found:

exa $ exa -l --color=never --time-style=long-iso
.rw-r--r-- 3.7k ilius 2022-09-12 01:18 build.rs
.rw-r--r--  10k ilius 2022-09-12 01:18 Cargo.lock
.rw-r--r-- 1.9k ilius 2022-09-12 01:18 Cargo.toml
drwxr-xr-x    - ilius 2022-09-12 01:18 completions
drwxr-xr-x    - ilius 2022-09-12 01:18 devtools
.rw-r--r-- 2.6k ilius 2022-09-12 01:18 Justfile
.rw-r--r-- 1.1k ilius 2022-09-12 01:18 LICENCE
drwxr-xr-x    - ilius 2022-09-12 01:18 man
.rw-r--r--  11k ilius 2022-09-12 01:18 README.md
.rw-r--r--   31 ilius 2022-09-12 01:18 rust-toolchain.toml
.rw-r--r-- 455k ilius 2022-09-12 01:18 screenshots.png
drwxr-xr-x    - ilius 2022-09-12 01:18 snap
drwxr-xr-x    - ilius 2022-09-12 01:18 src
.rw-r--r-- 6.1k ilius 2022-09-12 01:18 Vagrantfile
drwxr-xr-x    - ilius 2022-09-12 01:18 xtests

exa $ exa -l --color=never --time-style=long-iso | table-to-json
[".rw-r--r--", "1.9k", "ilius", "2022-09-12", "01:18", "Cargo.toml"]
["drwxr-xr-x", "-", "ilius", "2022-09-12", "01:18", "completions"]
["drwxr-xr-x", "-", "ilius", "2022-09-12", "01:18", "devtools"]
[".rw-r--r--", "2.6k", "ilius", "2022-09-12", "01:18", "Justfile"]
[".rw-r--r--", "1.1k", "ilius", "2022-09-12", "01:18", "LICENCE"]
["drwxr-xr-x", "-", "ilius", "2022-09-12", "01:18", "man"]
[".rw-r--r--", "11k", "ilius", "2022-09-12", "01:18", "README.md"]
[".rw-r--r--", "31", "ilius", "2022-09-12", "01:18", "rust-toolchain.toml"]
[".rw-r--r--", "455k", "ilius", "2022-09-12", "01:18", "screenshots.png"]
["drwxr-xr-x", "-", "ilius", "2022-09-12", "01:18", "snap"]
["drwxr-xr-x", "-", "ilius", "2022-09-12", "01:18", "src"]
[".rw-r--r--", "6.1k", "ilius", "2022-09-12", "01:18", "Vagrantfile"]
["drwxr-xr-x", "-", "ilius", "2022-09-12", "01:18", "xtests"]

Now if I want to sort by datetime, I have to combine date and time columns.

Fixing these problems and parsing tabular data correctly is too complicated for every program that needs to read this. We can avoid all this pain by printing json in the first place.

ilius avatar Sep 11 '22 21:09 ilius

Oh and if the table happens to have only one row, with cells that have spaces, it's almost impossible to parse tabular data without using regex or having specific conditions about the length and type of data...

ilius avatar Sep 11 '22 21:09 ilius

It's better to use nushell now, you can ls into multiple different formats. Eg use ls | to json or ls | to csv.

: help to
Translate structured data to a format

Usage:
  > to

Subcommands:
  to csv - Convert table into .csv text
  to csv - Saves dataframe to csv file
  to html - Convert table into simple HTML
  to json - Converts table data into JSON text.
  to md - Convert table into simple Markdown
  to nuon - Converts table data into Nuon (Nushell Object Notation) text.
  to parquet - Saves dataframe to parquet file
  to text - Converts data into simple text.
  to toml - Convert table into .toml text
  to tsv - Convert table into .tsv text
  to url - Convert table into url-encoded text
  to xml - Convert table into .xml text
  to yaml - Convert table into .yaml/.yml text

Flags:
  -h, --help - Display this help message

Dialga avatar Sep 12 '22 07:09 Dialga

I'm quite familiar with nushell. I played with it for quote a whike and I have many merged commits (and many in my personal fork) in it. The reason I abandoned it is that it's not very Unix-friendly, it's mostly Windows-oriented. It's also slow to start up when you want to use it within another shell like bash or zsh. And pipes are not streams like Unix pipes, they are more like windows powershell pipes.

ilius avatar Sep 15 '22 17:09 ilius