jq
jq copied to clipboard
Filter object to specific keys
Hello! Let's imagine the following scenario, where given an object like:
{
"a": 1,
"b": 2,
"c": 3,
"d": 4
}
We wish to extract only a set of keys, for example only a
and c
, giving us the following output:
{
"a": 1,
"c": 3
}
There are three immediate solutions I see to this:
- Explicitly extract these set of keys, e.g.:
{ a: .a, c: .c }
This works but scales less well. In my scenario, there are quite a few of such keys which were quite long and nested inside sub-objects, looking something like:
{ path: { to: { object: { a: .path.to.object.a, c: .path.to.object.c, e: .path.to.object.e, fooblorg: .path.to.object.fooblorg } } } }
Which isn't the nicest thing in the world.
-
select
on the object entries, e.g.:
to_entries | [.[] | select(.key | in(["a", "c"]))] | from_entries
A very interesting alternative, scaling to sub-objects and larger sets of keys. However the reader with the jq interpreter in his mind will immediately shout, just like jq does:
jq: error (at <stdin>:5): Cannot check whether array has a string key
So, we have to supply a "truth map" to in
, like so:
to_entries | [.[] | select(.key | in({ a: 1, c: 1 }))] | from_entries
A bit less nice, but still legible. It would however be great to have an operator which searches for a needle in a haystack, the array equivalent for the internal jv_object_has
function.
- Filter an object given a specific keys
There doesn't seem to be such a builtin in jq, which in all fairness makes sense: There's no such builtin in javascript either. In js it would be implemented just like solution (2), i.e.
Object.fromEntries(Object.entries(...).filter(...))
Having said that, many toolkits provide such a method:
-
lodash.pick
-
underscore.pick
-
Rambda.pick
- uuhh...probably others, similarly named
pick
?
Anyway, I've rambled long enough. I guess this entire blob could be condensed into
tl;dr
- It could be useful to have a "value contained in array" filter
- It could be useful to have a "pick the following keys from an object" filter
What do you think? Thank you for your time.
Interesting issue. Firstly I wrote a simple implementation using reduce.
def pick($paths):
. as $root | reduce $paths[] as $path ({}; . + { ($path): $root[$path] });
% echo '{"a":1,"b":2,"c":3,"d":4}' | jq 'def pick($paths): . as $root | reduce $paths[] as $path ({}; . + { ($path): $root[$path] }); pick(["a", "c"])'
{
"a": 1,
"c": 3
}
However this does not allow to pick a value from nested object.
> _.pick({'a':{'x': 1, 'y':2},'b':2, 'c': 3}, [['a', 'x'], 'b'])
{a: {x: 1}, b: 2}
Here's a version with nested keys support ([.]|flatten(1)
is dirty, I think there is better solution).
def pick($paths):
. as $root | reduce ($paths[]|[.]|flatten(1)) as $path ({}; . + setpath($path; $root|getpath($path)));
% echo '{"a":{"x":1,"y":2},"b":2,"c":3,"d":4}' | jq 'def pick($paths): . as $root | reduce ($paths[]|[.]|flatten(1)) as $path ({}; . + setpath($path; $root|getpath($path))); pick([["a","x"], "b"])'
{
"a": {
"x": 1
},
"b": 2
}
Renaming problem is that we should skip non-existing paths or not.
There are at least as many approaches to querying as there are ways of specifying the paths of interest. One generic approach is to use an arbitrary JSON object as a query template:
# Use object obj as a query template.
# Only the all-key paths in obj are considered.
def projection(obj):
def allpaths: # include paths to null
def conditional_recurse(f): def r: ., (select(.!=null) | f | r); r;
path(conditional_recurse(.[]?)) | select(length > 0);
def relevantpaths:
obj
| allpaths
| . as $p
| select( obj | getpath($p) | type | (. != "array" and . != "object"))
| select( all(.[]; type == "string"));
. as $in
| reduce relevantpaths as $p ({}; setpath($p; $in|getpath($p)));
Example:
def grahame: { "author": { "surname": "Grahame", "firstname":"Kenneth" }, "born": 1859 };
grahame | projection( { "author": { "surname": null }, "died": 1932 } )
Result:
{
"author": {
"surname": "Grahame"
},
"died": null
}
Considering picking a few paths from a large JSON (e.x. pick only id and name from GitHub API v3 object), I think it's better not to iterate though all the paths.
@itchyny - If your comment is about def projection,
it is misdirected, as allpaths
is applied only to the template.
- Explicitly extract these set of keys, e.g.: { a: .a, c: .c }
You can use shortcut syntax here – just {a, c}
. But it won't work for nested structures.
@pkoppstein for template object construction in your example there is also a shortcut syntax for object construction with null
values:
{author: {surname}, died}
Funny how both of these cases are the same syntax but different semantics depending on what is .
If your comment is about def projection, it is misdirected, as allpaths is applied only to the template.
Ooops, sorry about that.
@odnoletkov - Yes, but one has to be careful as {author: {surname}, died}
is NOT the same as
{author: {surname: null}, died: null}
. To understand the issue, consider:
def id(x): x;
{author: {surname}, died: {x: 0}} | id({author: {surname}, died})
Maybe we can implement haspath
builtin as well?
@itchyny - With the above def of allpaths, haspath can be defined as:
def haspath($p): first(allpaths == $p // empty) // false;
@pkoppstein Maybe haspath($p)
does not need to list all the paths.
seq 10000 | jq --slurp '
def allpaths: # include paths to null
def conditional_recurse(f): def r: ., (select(.!=null) | f | r); r;
path(conditional_recurse(.[]?)) | select(length > 0);
def haspath($p): first((allpaths|debug) == $p // empty) // false;
haspath([1000])'
@itchyny wrote:
Maybe haspath($p) does not need to list all the paths.
It doesn't. That's what first/1
is for.
I suggest following implementation.
def haspath($path):
($path|length) == 0 or
(type | . == "object" or . == "array") and
has($path[0]) and (.[$path[0]]|haspath($path[1:]));
def pick($paths):
. as $root |
reduce ($paths[]|[.]|flatten(1)) as $path
({}; if $root|haspath($path) then . + setpath($path; $root|getpath($path)) else . end);
@itchyny - Your haspath
uses a form of recursion which jq does not support efficiently. Here's an efficient implementation that uses the same insight:
def haspath($path):
def h:
. as [$json, $p]
| (($p|length)==0) or
($json | (has($p[0]) and ( [getpath([$p[0]]), $p[1:] ] | h)));
[., $path] | h;
@itchyny - Please read up on TCO on the jq wiki. You could start at https://github.com/stedolan/jq/wiki/Advanced-Topics#tail-call-optimization
Please read up on TCO on the jq wiki. You could start at https://github.com/stedolan/jq/wiki/Advanced-Topics#tail-call-optimization
Thanks, mate.
@itchyny @pkoppstein I was looking for a solution for this problem for a while and found your code very helpful. Thank you! However, I have a use case which doesn't seem to work with the above implementation: If the input object contains an array.
For example:
{
"a": 1,
"b": 2,
"c": [
{
"d": 1,
"e": 2
},
{
"d": 2,
"e": 3
}
]
}
and we wish to extract only .a
and .c[].d
, I couldn't an input to pass to pick
that would work.
You could select .c[0].d
with pick([["c", 0, "d"]])
, but you can't pass a wildcard in the form of .
or *
or something like that instead of 0
. I went down the route of enumerating all the possible indices and tried pick(path(.c[].d))
but got jq: error (at test.json:14): Cannot check whether object has a number key
.
Perhaps you have a suggestion for solving this use case?
Sorry in advance if I'm missing a really obvious way of solving this - I'm a jq
newbie.
@alon-k you mean something like this?
$ echo '{"a":1,"b":2,"c":[{"d":1,"e":2},{"d":2,"e":3}]}' | \
jq 'def pick(p): . as $v | reduce path(p) as $p ({}; setpath($p; $v | getpath($p))); pick(.a, .c[].d)'
{
"a": 1,
"c": [
{
"d": 1
},
{
"d": 2
}
]
}
But note that with this paths that don't exist in the input will be set to null
in the output object. To fix that one probably need to use the haspath
thing talked about above.
@wader Yes, exactly. Thank you!
For anyone interested, here is my final code:
def haspath($path):
def h:
. as [$json, $p]
| (($p|length)==0) or
($json | (has($p[0]) and ( [getpath([$p[0]]), $p[1:] ] | h)));
[., $path] | h;
def pick(paths):
. as $root |
reduce path(paths) as $path
({}; if$root|haspath($path) then . + setpath($path; $root | getpath($path)) else . end);
Result:
$ jq "$(cat better_query.jq)"' pick(.a, .c[].f, .c[].e)' test.json
{
"a": 1,
"c": [
{
"e": 2
},
{
"e": 3
}
]
}
👍 Nice, thanks for sharing
@alon-k -f
might be interesting if you want to read a script from a file
The following works using native jq
to_entries | [.[] | select( .key as $key | any(["a", "b"][] == $key) ))] | from_entries
echo '{"a": 1, "b": 2, "c": 3}' | jq '. | to_entries | [.[] | select( .key as $key | any(["a", "c"][] == $key) )] | from_entries'
{
"a": 1,
"c": 3
}
@LaurentGoderre
note that ["foo","bar"][]
is basically just ("foo", "bar")
for constants.
You can rewrite that as just:
jq 'del(.[keys[] | select(IN("a", "b", "z") | not)])' <<< '{"a": 1, "b": 2, "c": 3}'
{
"a": 1,
"b": 2
}
or
jq 'with_entries(select(.key | IN("a", "b", "z")))' <<< '{"a": 1, "b": 2, "c": 3}'
{
"a": 1,
"b": 2
}
In jq 1.7 with the new pick/1
builtin you can also conveniently use something like
jq 'pick(("a", "b", "z") as $k | select(has($k))[$k])' <<< '{"a": 1, "b": 2, "c": 3}'
{
"a": 1,
"b": 2
}
Could this issue be closed then since there are many documented ways to achieve this?
Well, those "solutions" that were just mentioned would only work for one level.
Looks resolved by pick/1
in jq 1.7.