jq icon indicating copy to clipboard operation
jq copied to clipboard

Randomly select n items from array

Open fewebahr opened this issue 9 years ago • 22 comments

Just like limit(n, expr) and nth(n, expr), it would be great to have an equivalent rand(n, expr) that enables us to easily return any number of items from an array.

This would be extremely useful for picking out a randomly selected server:port combination from a JSON-based key/value store or service discovery engine like etcd.

As an example, suppose the input was:

{ "action": "get", "node": { "key": "/nginx-80", "dir": true, "nodes": [ { "key": "/nginx-80/core-01:another-nginx:80", "value": "172.17.8.101:32781", "modifiedIndex": 5802, "createdIndex": 5802 }, { "key": "/nginx-80/core-01:nginx:80", "value": "172.17.8.101:32777", "modifiedIndex": 4316, "createdIndex": 4316 } ], "modifiedIndex": 3749, "createdIndex": 3749 } }

And the jq program was: . | [rand(1;.node.nodes[]?.value)][0]?

Then the response could either be: "172.17.8.101:32781" or "172.17.8.101:32777"

fewebahr avatar Dec 09 '15 15:12 fewebahr

A builtin that outputs random numbers would be nice, yes.

nicowilliams avatar Dec 09 '15 21:12 nicowilliams

It would provide some interesting behaviors, though. Since it's not idempotent, if we re-evaluate it later, you won't get the same response.

I feel like we've discussed it before, though.

On Wed, Dec 9, 2015 at 4:31 PM Nico Williams [email protected] wrote:

A builtin that outputs random numbers would be nice, yes.

— Reply to this email directly or view it on GitHub https://github.com/stedolan/jq/issues/1038#issuecomment-163397678.

wtlangford avatar Dec 09 '15 21:12 wtlangford

We've already crossed the non-idempotent/non-referentially transparent line with inputs.

nicowilliams avatar Dec 09 '15 21:12 nicowilliams

The cfunction descriptors should document whether each function is deterministic or not, and whether it's worth applying it at compile-time to constant inputs.

nicowilliams avatar Dec 09 '15 21:12 nicowilliams

This is true! I suppose some documentation on that point might be sufficient for it, then. Something about "if you need the same value multiple times, use random as $randvar", so that it doesn't imply we've managed your state for you at all.

Then again, we might not need it at all. That's the standard behavior for random functions most places anyways.

On Wed, Dec 9, 2015 at 4:51 PM Nico Williams [email protected] wrote:

We've already crossed the non-idempotent/non-referentially transparent line with inputs.

— Reply to this email directly or view it on GitHub https://github.com/stedolan/jq/issues/1038#issuecomment-163405496.

wtlangford avatar Dec 09 '15 21:12 wtlangford

This ER is essentially a duplicate of #677

pkoppstein avatar Dec 09 '15 22:12 pkoppstein

It's trivial to build a random builtin on Unix where /dev/urandom or various recent random APIs are available. On Win32 we can use CryptAcquireContext() and CryptGenRandom() where available.

But what about where we can find no suitable method of getting entropy? The builtin would have to raise an error.

nicowilliams avatar Dec 09 '15 23:12 nicowilliams

@nicowilliams wrote:

... The builtin would have to raise an error.

It could fall back on a simplistic generator, even a jq-coded one (see #677).

pkoppstein avatar Dec 09 '15 23:12 pkoppstein

@pkoppstein That's asking for trouble.

nicowilliams avatar Dec 09 '15 23:12 nicowilliams

I'd be OK with a badrandom builtin that produces possibly lousy "random" number sequences.

nicowilliams avatar Dec 09 '15 23:12 nicowilliams

I like the approach random as $randvar mentioned by @wtlangford. We can always pass the result of that operation to nth().

fewebahr avatar Dec 11 '15 16:12 fewebahr

Well, the goal there was "if you really need a number that was once random, but you're hanging on to it for a while", then do it random as $randvar. Otherwise, if you just need a random number once or something, then just calling random would do. It's not a surprising behavior, but jq functions can get called more times than you might otherwise expect, depending on how your program's written.

On Fri, Dec 11, 2015 at 11:01 AM RobertGrantEllis [email protected] wrote:

I like the approach random as $randvar mentioned by @wtlangford https://github.com/wtlangford. We can always pass the result of that operation to nth().

— Reply to this email directly or view it on GitHub https://github.com/stedolan/jq/issues/1038#issuecomment-163974485.

wtlangford avatar Dec 11 '15 16:12 wtlangford

My exercise: to shuffle an array, classical solution. Given the code at the end you can shuffle an array using:

 [range(10)] | shuffle

That's all, folks!

I'm no strong at maths... sorry if I made some mistake.

# advance random state
def rand:
    (((214013 * .Seed) + 2531011) % 2147483648) as $Seed | # mod 2^31
    ($Seed / 65536 | floor) as $Bits |  # >> 16
    { $Bits, $Seed }    # 2^15 bits, 2^31 seed
;

# make random state
def randomize($seed):
    { Seed: $seed } | rand | rand
;
def randomize: randomize(now|floor);

# generate stream of random 2^15 values
def rands($state):
    $state | recurse(rand) | .Bits
;
def rands: rands(randomize);

# random integer [0..n)
def random($n; $state):
    $state | rand as $next |
    [ ($next.Bits % $n), $next ]
;
def random($n): random($n; .);

# randomize array contents
def shuffle($state):
    def swap($i; $j):
        if $i == $j
        then .
        else .[$i] as $t | .[$i] = .[$j] | .[$j] = $t
        end;

    . as $array |
    length as $len |
    [limit($len; rands($state))] as $r |
    reduce range($len-1; -1; -1) as $i
        ($array; swap($i; $r[$i] % (1+$i)))
;
def shuffle: shuffle(randomize);

fadado avatar Oct 19 '16 17:10 fadado

One work-around would be to supply the randomness in an argument. With the example data above this trick works:

jq --arg i $(($RANDOM % 2)) '.node.nodes[$i|tonumber].value' foo.json

dobbs avatar Feb 14 '20 21:02 dobbs

One work-around would be to supply the randomness in an argument. With the example data above this trick works:

jq --arg i $(($RANDOM % 2)) '.node.nodes[$i|tonumber].value' foo.json

Expanding on that solution:

$ cat input.json
[
    "first", "second", "third", "fourth", "fifth"
]
$ jq --arg i $(($RANDOM % $(jq length input.json))) '.[$i|tonumber]' input.json 
"third"

So you don't have to hardcode the number of elements :+1:

MrYakobo avatar Feb 06 '22 17:02 MrYakobo

Here's a solution using shuf from GNU coreutils:

echo '[1, 2, 3, 4]' | jq -c '.[]' | shuf | jq -s '.'

Output:

[
  4,
  1,
  3,
  2
]

mheiber avatar Dec 13 '22 17:12 mheiber

Here's a solution using shuf from GNU coreutils:

echo '[1, 2, 3, 4]' | jq -c '.[]' | shuf | jq -s '.'

Output:

[
  4,
  1,
  3,
  2
]

You can use shuf to take the number of random elements you want

echo '[1, 2, 3, 4]' | jq -c '.[]' | shuf -n <n> | jq -s '.'

jakob1379 avatar Jan 10 '23 10:01 jakob1379

Although I admit there might be more sophisticated use cases to a native random function; Here's one using the GNU Coreutils shuf and shell pipes; to extract random elements of an array,

Given an input json array in INPUT_FILE and the sample population size as N_POP

## Proivde INPUT_FILE, N_POP; sample without
## replacement

# Function
sample_json_array () {
  N_POP=${1:-10} # zsh fancy way of specifying a
                 # default value for variable
  PIPE=${2:-/tmp/mypipe}

  mkfifo ${PIPE}

  tee ${PIPE}                          \
    | jq $(cat ${PIPE}                 \
             | jq length               \
             | xargs seq 1             \
             | shuf                    \
             | head -n ${N_POP}        \
             | sed 's#.*#.[&],#g;'     \
             | tr -d '\n'              \
             | sed 's#,$##; s,.*,[&],' \
        )

  rm ${PIPE}
}

# Usage
cat ${INPUT_FILE} | sample_json_array ${N_POP}

bvraghav avatar Mar 09 '23 14:03 bvraghav

I actually came up with a solution that externalizes the randomness

jq -re '.node.nodes[]?.value' foo.json | shuf -n 1

But, let's say you needed several elements of a give entry: you would have to externalize them, and then reconstitute them if you need futher JQing:

 jq -cre '.node.nodes[]? ' foo.json | shuf -n 1 | jq .
{
  "key": "/nginx-80/core-01:nginx:80",
  "value": "172.17.8.101:32777",
  "modifiedIndex": 4316,
  "createdIndex": 4316
}
 jq -cre '.node.nodes[]? ' foo.json | shuf -n 1 | jq .
{
  "key": "/nginx-80/core-01:another-nginx:80",
  "value": "172.17.8.101:32781",
  "modifiedIndex": 5802,
  "createdIndex": 5802
}

by using -c, all the pretty print comes out and the whole json object is on a single line, which then shuf likes. This makes the assumption there are no newlines in your json value

chb0github avatar Sep 28 '23 21:09 chb0github

Adding another solution here, a one-liner that worked for me (wanting to sample from a large json array/file):

$   filename=large_file.json; \
    length=$(jq length $filename); \
    for i in {1..10}; do \
        jq --arg i $(($RANDOM % $length)) '.[$i|tonumber]' $filename; \
    done

Takes a few seconds (it's traversing the JSON file each iteration) but does the job.

astockwell avatar Oct 26 '23 14:10 astockwell