jq
jq copied to clipboard
refactor: move bsearch function to C-code
This commit fixes issue #527 and move the bsearch function to a native C-code.
The performance is a bit better:
Testing script:
clear
if [[ `uname` == Darwin ]]; then
MAX_MEMORY_UNITS=KB
else
MAX_MEMORY_UNITS=MB
fi
export TIMEFMT='%J %U user %S system %P cpu %*E total'$'\n'\
'avg shared (code): %X KB'$'\n'\
'avg unshared (data/stack): %D KB'$'\n'\
'total (sum): %K KB'$'\n'\
'max memory: %M '$MAX_MEMORY_UNITS''$'\n'\
'page faults from disk: %F'$'\n'\
'other page faults: %R'
echo "JQ code bsearch"
time /usr/bin/jq -n '[range(30000000)] | bsearch(3000)'
echo "C code bsearch"
time ./jq -n '[range(30000000)] | bsearch(3000)'
Results:
JQ code bsearch
3000
/usr/bin/jq -n '[range(30000000)] | bsearch(3000)' 8.63s user 0.77s system 98% cpu 9.542 total
avg shared (code): 0 KB
avg unshared (data/stack): 0 KB
total (sum): 0 KB
max memory: 823 MB
page faults from disk: 1
other page faults: 432828
C code bsearch
3000
./jq -n '[range(30000000)] | bsearch(3000)' 8.44s user 0.74s system 99% cpu 9.249 total
avg shared (code): 0 KB
avg unshared (data/stack): 0 KB
total (sum): 0 KB
max memory: 824 MB
page faults from disk: 0
other page faults: 432766
The results may be better if we can use jvp_array_read, and there is no need to copy/free the input array in each iteration. I guess that is like that for API pourposes when the libjq is in use with multiple threads in place.
Closes #527
@emanuele6 I think this is ready
The bsearch filter should support searching from an array of any kind.
$ jq -n '[{x:range(3),y:range(3)}] | debug(. == sort) | bsearch({x:1,y:1})'
["DEBUG:",true]
4
@emanuele6 @itchyny I've just refactored this, and merging should be okay.
I found a regression. On updating tests see my review comment above.
$ jq -n '[1,2] | bsearch(2)'
1
$ ./jq -n '[1,2] | bsearch(2)'
-3
I think we can reduce special cases, simplify the logic.
static jv f_bsearch(jq_state *jq, jv input, jv target) {
if (jv_get_kind(input) != JV_KIND_ARRAY) {
return type_error(input, "cannot be searched from");
}
int start = 0;
int end = jv_array_length(jv_copy(input));
jv answer = jv_invalid();
while (start < end) {
int mid = (start + end) / 2;
int result = jv_cmp(jv_copy(target), jv_array_get(jv_copy(input), mid));
if (result == 0) {
answer = jv_number(mid);
break;
} else if (result < 0) {
end = mid;
} else {
start = mid + 1;
}
}
if (!jv_is_valid(answer)) {
answer = jv_number(-1 - start);
}
jv_free(input);
jv_free(target);
return answer;
}
I just remembered that (start + end) / 2 for binary search is not recommended because it can overflow.
This can be avoided using start + (end - start) / 2 instead.
See https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues