jq
jq copied to clipboard
Randomly select n items from array
Just like limit(n, expr) and nth(n, expr), it would be great to have an equivalent rand(n, expr) that enables us to easily return any number of items from an array.
This would be extremely useful for picking out a randomly selected server:port combination from a JSON-based key/value store or service discovery engine like etcd.
As an example, suppose the input was:
{ "action": "get", "node": { "key": "/nginx-80", "dir": true, "nodes": [ { "key": "/nginx-80/core-01:another-nginx:80", "value": "172.17.8.101:32781", "modifiedIndex": 5802, "createdIndex": 5802 }, { "key": "/nginx-80/core-01:nginx:80", "value": "172.17.8.101:32777", "modifiedIndex": 4316, "createdIndex": 4316 } ], "modifiedIndex": 3749, "createdIndex": 3749 } }
And the jq program was: . | [rand(1;.node.nodes[]?.value)][0]?
Then the response could either be: "172.17.8.101:32781" or "172.17.8.101:32777"
A builtin that outputs random numbers would be nice, yes.
It would provide some interesting behaviors, though. Since it's not idempotent, if we re-evaluate it later, you won't get the same response.
I feel like we've discussed it before, though.
On Wed, Dec 9, 2015 at 4:31 PM Nico Williams [email protected] wrote:
A builtin that outputs random numbers would be nice, yes.
— Reply to this email directly or view it on GitHub https://github.com/stedolan/jq/issues/1038#issuecomment-163397678.
We've already crossed the non-idempotent/non-referentially transparent line with inputs.
The cfunction descriptors should document whether each function is deterministic or not, and whether it's worth applying it at compile-time to constant inputs.
This is true! I suppose some documentation on that point might be
sufficient for it, then. Something about "if you need the same value
multiple times, use random as $randvar", so that it doesn't imply we've
managed your state for you at all.
Then again, we might not need it at all. That's the standard behavior for random functions most places anyways.
On Wed, Dec 9, 2015 at 4:51 PM Nico Williams [email protected] wrote:
We've already crossed the non-idempotent/non-referentially transparent line with inputs.
— Reply to this email directly or view it on GitHub https://github.com/stedolan/jq/issues/1038#issuecomment-163405496.
This ER is essentially a duplicate of #677
It's trivial to build a random builtin on Unix where /dev/urandom or various recent random APIs are available. On Win32 we can use CryptAcquireContext() and CryptGenRandom() where available.
But what about where we can find no suitable method of getting entropy? The builtin would have to raise an error.
@nicowilliams wrote:
... The builtin would have to raise an error.
It could fall back on a simplistic generator, even a jq-coded one (see #677).
@pkoppstein That's asking for trouble.
I'd be OK with a badrandom builtin that produces possibly lousy "random" number sequences.
I like the approach random as $randvar mentioned by @wtlangford. We can always pass the result of that operation to nth().
Well, the goal there was "if you really need a number that was once
random, but you're hanging on to it for a while", then do it random as $randvar. Otherwise, if you just need a random number once or something,
then just calling random would do. It's not a surprising behavior, but
jq functions can get called more times than you might otherwise expect,
depending on how your program's written.
On Fri, Dec 11, 2015 at 11:01 AM RobertGrantEllis [email protected] wrote:
I like the approach random as $randvar mentioned by @wtlangford https://github.com/wtlangford. We can always pass the result of that operation to nth().
— Reply to this email directly or view it on GitHub https://github.com/stedolan/jq/issues/1038#issuecomment-163974485.
My exercise: to shuffle an array, classical solution. Given the code at the end you can shuffle an array using:
[range(10)] | shuffle
That's all, folks!
I'm no strong at maths... sorry if I made some mistake.
# advance random state
def rand:
(((214013 * .Seed) + 2531011) % 2147483648) as $Seed | # mod 2^31
($Seed / 65536 | floor) as $Bits | # >> 16
{ $Bits, $Seed } # 2^15 bits, 2^31 seed
;
# make random state
def randomize($seed):
{ Seed: $seed } | rand | rand
;
def randomize: randomize(now|floor);
# generate stream of random 2^15 values
def rands($state):
$state | recurse(rand) | .Bits
;
def rands: rands(randomize);
# random integer [0..n)
def random($n; $state):
$state | rand as $next |
[ ($next.Bits % $n), $next ]
;
def random($n): random($n; .);
# randomize array contents
def shuffle($state):
def swap($i; $j):
if $i == $j
then .
else .[$i] as $t | .[$i] = .[$j] | .[$j] = $t
end;
. as $array |
length as $len |
[limit($len; rands($state))] as $r |
reduce range($len-1; -1; -1) as $i
($array; swap($i; $r[$i] % (1+$i)))
;
def shuffle: shuffle(randomize);
One work-around would be to supply the randomness in an argument. With the example data above this trick works:
jq --arg i $(($RANDOM % 2)) '.node.nodes[$i|tonumber].value' foo.json
One work-around would be to supply the randomness in an argument. With the example data above this trick works:
jq --arg i $(($RANDOM % 2)) '.node.nodes[$i|tonumber].value' foo.json
Expanding on that solution:
$ cat input.json
[
"first", "second", "third", "fourth", "fifth"
]
$ jq --arg i $(($RANDOM % $(jq length input.json))) '.[$i|tonumber]' input.json
"third"
So you don't have to hardcode the number of elements :+1:
Here's a solution using shuf from GNU coreutils:
echo '[1, 2, 3, 4]' | jq -c '.[]' | shuf | jq -s '.'
Output:
[
4,
1,
3,
2
]
Here's a solution using shuf from GNU coreutils:
echo '[1, 2, 3, 4]' | jq -c '.[]' | shuf | jq -s '.'Output:
[ 4, 1, 3, 2 ]
You can use shuf to take the number of random elements you want
echo '[1, 2, 3, 4]' | jq -c '.[]' | shuf -n <n> | jq -s '.'
Although I admit there might be more sophisticated use cases to a native random function; Here's one using the GNU Coreutils shuf and shell pipes; to extract random elements of an array,
Given an input json array in INPUT_FILE and the sample population size as N_POP
## Proivde INPUT_FILE, N_POP; sample without
## replacement
# Function
sample_json_array () {
N_POP=${1:-10} # zsh fancy way of specifying a
# default value for variable
PIPE=${2:-/tmp/mypipe}
mkfifo ${PIPE}
tee ${PIPE} \
| jq $(cat ${PIPE} \
| jq length \
| xargs seq 1 \
| shuf \
| head -n ${N_POP} \
| sed 's#.*#.[&],#g;' \
| tr -d '\n' \
| sed 's#,$##; s,.*,[&],' \
)
rm ${PIPE}
}
# Usage
cat ${INPUT_FILE} | sample_json_array ${N_POP}
I actually came up with a solution that externalizes the randomness
jq -re '.node.nodes[]?.value' foo.json | shuf -n 1
But, let's say you needed several elements of a give entry: you would have to externalize them, and then reconstitute them if you need futher JQing:
jq -cre '.node.nodes[]? ' foo.json | shuf -n 1 | jq .
{
"key": "/nginx-80/core-01:nginx:80",
"value": "172.17.8.101:32777",
"modifiedIndex": 4316,
"createdIndex": 4316
}
jq -cre '.node.nodes[]? ' foo.json | shuf -n 1 | jq .
{
"key": "/nginx-80/core-01:another-nginx:80",
"value": "172.17.8.101:32781",
"modifiedIndex": 5802,
"createdIndex": 5802
}
by using -c, all the pretty print comes out and the whole json object is on a single line, which then shuf likes. This makes the assumption there are no newlines in your json value
Adding another solution here, a one-liner that worked for me (wanting to sample from a large json array/file):
$ filename=large_file.json; \
length=$(jq length $filename); \
for i in {1..10}; do \
jq --arg i $(($RANDOM % $length)) '.[$i|tonumber]' $filename; \
done
Takes a few seconds (it's traversing the JSON file each iteration) but does the job.