jq icon indicating copy to clipboard operation
jq copied to clipboard

Adding `sort_keys` filter as a flexible alternative to `--sort-keys`

Open 01mf02 opened this issue 1 year ago • 10 comments

Hi!

In my alternative jq implementation jaq, I would like to implement the functionality of jq's --sort-keys command-line option, but without creating a command-line option. In particular, in the corresponding issue, I thought about introducing a new filter sort_keys:

def sort_keys: walk(if . >= {} then reduce (keys[] as $k | { ($k): .[$k] }) as $o ({}; . + $o) end);

This filter would allow for the same functionality as today's --sort-keys: Where jq --sort-keys 'f' is currently used, jq 'f | sort_keys' could be used alternatively. This would also allow for sorting only specific objects by keys, where --sort-keys works only on all output unconditionally.

To keep compatibility between jaq and jq, I would like to synchronise my actions with you. So I'd like to ask the wider jq community: What do you think about this? How would you feel about including sort_keys into a future jq release? If your response is positive, I would be happy to make a PR that includes sort_keys into builtin.jq.

01mf02 avatar Jan 19 '24 11:01 01mf02

Think it could be a nice addition. By "allow for sorting only specific objects by keys" you mean to be able to recursively sort part of a object e.g. .a |= sort_keys?

Some thoughts and questions:

  • Could possibly sory_by_keys be an alternative name?
  • . => {} is fancy type == "object"?
  • Is there some performance reason to split the input object into small one-key-objects and then merge them instead of doing something like def sort_keys: walk(if . >= {} then . as $o | reduce keys[] as $k ({}; .[$k] = $o[$k]) end);?

wader avatar Jan 19 '24 14:01 wader

Think it could be a nice addition. By "allow for sorting only specific objects by keys" you mean to be able to recursively sort part of a object e.g. .a |= sort_keys?

I like this too.

Some thoughts and questions:

* Could possibly `sory_by_keys` be an alternative name?

Since the existing command-line option is already --sort-keys, naming the new builtin something close to that seems best, IMO.

* `. => {}` is fancy `type == "object"`?

Huh. In jq the greater-than-or-equal operator is >=, not =>. Comparisons of values of different types return the type of one minus the type of the other, with types expressed numerically. The only input value that would cause . >= {} to be true is {}.

* Is there some performance reason to split the input object into small one-key-objects and then merge them instead of doing something like `def sort_keys: walk(if . >= {} then . as $o | reduce keys[] as $k ({}; .[$k] = $o[$k]) end);`?

I would definitely implement this as a C-coded builtin in jq to optimize this. And probably just rewrite each object's insertion order by re-writing all the next fields of all the buckets to match sorted key order.

nicowilliams avatar Jan 19 '24 17:01 nicowilliams

I should add that I wish keys had been a special function (like empty) that streams the object's keys or array's indices rather than outputting an array of keys. We should probably add a streamkeys or keyss or some such built-in that does just that. EDIT: Or maybe special syntax for this, like .[!] (since Bash uses ! in ${!var[@]} to refer to keys instead of values.

nicowilliams avatar Jan 19 '24 23:01 nicowilliams

Ah, so here . => {} (or more likely, . >= {} really means . != {}, and assumes . is an object.

nicowilliams avatar Jan 19 '24 23:01 nicowilliams

Yes sorry for my shitty typing, i meant . >= {}

wader avatar Jan 20 '24 00:01 wader

sort does not sort recursively, map does not map recursively, and neither should sort_keys operate recursively.

Furthermore, a non-recursive sort_keys (*) is perfectly useful in itself, and the recursive version of sort_keys can be easily enough implemented using the non-recursive version, so adding the non-recursive version should be more than sufficient


  • This def has the semantics I have in mind:

    def sort_keys: to_entries | sort | from_entries;

pkoppstein avatar Jan 20 '24 03:01 pkoppstein

Since the existing command-line option is already --sort-keys, naming the new builtin something close to that seems best, IMO.

Mm agree, that make sense. Also if we ever would wants a _by variant it makes more sense sort_keys_by(f) compared to sort_by_keys_by 😬

I would definitely implement this as a C-coded builtin in jq to optimize this. And probably just rewrite each object's insertion order by re-writing all the next fields of all the buckets to match sorted key order.

Also makes sense 👍

wader avatar Jan 20 '24 08:01 wader

sort does not sort recursively, map does not map recursively, and neither should sort_keys operate recursively.

That's a good point. But would be confusing that sort_keys would be non-recursive but --sort-keys would? Hmm

wader avatar Jan 20 '24 10:01 wader

sort does not sort recursively, map does not map recursively, and neither should sort_keys operate recursively.

That's a good point. But would be confusing that sort_keys would be non-recursive but --sort-keys would? Hmm

Yes, I think so, but a non-recursive version is needed too.

nicowilliams avatar Jan 20 '24 23:01 nicowilliams

a non-recursive version is needed too.

Here's a thought: define sort_keys non-recursively but in a way that makes it trivial to use recursively, e.g. by walk(sort_keys).

An appropriate def would be:

def sort_keys:
  if type == "object" then to_entries | sort | from_entries else . end;

Regarding the tension between having the command-line option --sort-keys be recursive but the builtin sort_keys be non-recursive -- if this is indeed going to be a significant obstacle, then how about deprecating the long form --sort-keys in favor of an alternative long form name for the -S option?

pkoppstein avatar Jan 22 '24 10:01 pkoppstein