jq icon indicating copy to clipboard operation
jq copied to clipboard

refactor: move bsearch function to C-code

Open eloycoto opened this issue 2 years ago • 6 comments
trafficstars

This commit fixes issue #527 and move the bsearch function to a native C-code.

The performance is a bit better:

Testing script:

clear
if [[ `uname` == Darwin ]]; then
    MAX_MEMORY_UNITS=KB
else
    MAX_MEMORY_UNITS=MB
fi

export TIMEFMT='%J   %U  user %S system %P cpu %*E total'$'\n'\
'avg shared (code):         %X KB'$'\n'\
'avg unshared (data/stack): %D KB'$'\n'\
'total (sum):               %K KB'$'\n'\
'max memory:                %M '$MAX_MEMORY_UNITS''$'\n'\
'page faults from disk:     %F'$'\n'\
'other page faults:         %R'

echo "JQ code bsearch"
time /usr/bin/jq -n '[range(30000000)] | bsearch(3000)'

echo "C code bsearch"
time ./jq -n '[range(30000000)] | bsearch(3000)'

Results:

JQ code bsearch
3000
/usr/bin/jq -n '[range(30000000)] | bsearch(3000)'   8.63s  user 0.77s system 98% cpu 9.542 total
avg shared (code):         0 KB
avg unshared (data/stack): 0 KB
total (sum):               0 KB
max memory:                823 MB
page faults from disk:     1
other page faults:         432828
C code bsearch
3000
./jq -n '[range(30000000)] | bsearch(3000)'   8.44s  user 0.74s system 99% cpu 9.249 total
avg shared (code):         0 KB
avg unshared (data/stack): 0 KB
total (sum):               0 KB
max memory:                824 MB
page faults from disk:     0
other page faults:         432766

The results may be better if we can use jvp_array_read, and there is no need to copy/free the input array in each iteration. I guess that is like that for API pourposes when the libjq is in use with multiple threads in place.


Closes #527

eloycoto avatar Nov 15 '23 15:11 eloycoto

@emanuele6 I think this is ready

eloycoto avatar Nov 21 '23 10:11 eloycoto

The bsearch filter should support searching from an array of any kind.

 $ jq -n '[{x:range(3),y:range(3)}] | debug(. == sort) | bsearch({x:1,y:1})'
["DEBUG:",true]
4

itchyny avatar Dec 04 '23 14:12 itchyny

@emanuele6 @itchyny I've just refactored this, and merging should be okay.

eloycoto avatar Feb 05 '24 08:02 eloycoto

I found a regression. On updating tests see my review comment above.

 $ jq -n '[1,2] | bsearch(2)'          
1
 $ ./jq -n '[1,2] | bsearch(2)'
-3

itchyny avatar Feb 07 '24 11:02 itchyny

I think we can reduce special cases, simplify the logic.

static jv f_bsearch(jq_state *jq, jv input, jv target) {
  if (jv_get_kind(input) != JV_KIND_ARRAY) {
    return type_error(input, "cannot be searched from");
  }
  int start = 0;
  int end = jv_array_length(jv_copy(input));
  jv answer = jv_invalid();
  while (start < end) {
    int mid = (start + end) / 2;
    int result = jv_cmp(jv_copy(target), jv_array_get(jv_copy(input), mid));
    if (result == 0) {
      answer = jv_number(mid);
      break;
    } else if (result < 0) {
      end = mid;
    } else {
      start = mid + 1;
    }
  }
  if (!jv_is_valid(answer)) {
    answer = jv_number(-1 - start);
  }
  jv_free(input);
  jv_free(target);
  return answer;
}

itchyny avatar Feb 12 '24 01:02 itchyny

I just remembered that (start + end) / 2 for binary search is not recommended because it can overflow. This can be avoided using start + (end - start) / 2 instead. See https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues

emanuele6 avatar Feb 20 '24 00:02 emanuele6