jq icon indicating copy to clipboard operation
jq copied to clipboard

jv: Add some support for 64 bit ints in a very conservative way

Open dequis opened this issue 9 years ago • 12 comments

This adds an extra int64_t field to jv, which is only written when parsing number literals (and only if they are integers within range), and only read when printing.

Any operations done over those numbers will downgrade them to a double with 53 bit precision. This is intentional for the scope of this patch:

$ echo 111111111111111111 | jq -c '[., 1*.]'
[111111111111111111,111111111111111100]

For ints between 53 and 64 bits, this matches the behavior of awk, as suggested in the bug tracker by tischwa:

$ echo 111111111111111111 | awk '{print $1, 1*$1}'
111111111111111111 111111111111111104

Fixes issue #369 (at least partially)

I tried to add tests but the stuff in jq.test seems to be parsed by jq itself, so they end up being comparisons between the double values, not the int64 ones.

This is an itch i've really needed to scratch for a long time. Using jq as a pretty printer for json, I almost started getting used to big numbers getting mangled. Not anymore!

dequis avatar Oct 03 '16 03:10 dequis

Coverage Status

Coverage increased (+0.4%) to 85.776% when pulling cf11ac03f44c8ba9f7948f8e50d05756b3a377d2 on dequis:64bit into 0b8218515eabf1a967eba0dbcc7a0e5ae031a509 on stedolan:master.

coveralls avatar Oct 03 '16 03:10 coveralls

Hi. Thanks for this contribution. I left a comment on jv.h. Let us know what you think.

nicowilliams avatar Jan 23 '17 19:01 nicowilliams

omg you're alive

dequis avatar Jan 23 '17 20:01 dequis

heheh, yeah, I'm alive. Sorry for the absence :( I've been heads down in other things.

nicowilliams avatar Jan 23 '17 20:01 nicowilliams

I'm working on an alternative version of this where a numeric jv can only be a double, and int64_t, or a uint64_t. The parser will use strtoumax() or similar when a numeric value has no exponent and no decimal, and will produce an a 64-bit integer, signed or unsigned as necessary if the integer is small enough to fit in 64 bits. The dumper will, of course, trivially dump 64-bit ints. There will be new jv functions to go with all of this.

My main fear is that this will slow things down. I may add macros for disabling this feature.

nicowilliams avatar Jan 28 '17 19:01 nicowilliams

Why is it not acceptable to increase sizeof(jv) anyway? Is there some sort of API/ABI guarantee for other projects?

dequis avatar Jan 28 '17 20:01 dequis

There's no ABI constraint on the size of a jv. But jvs are rather compact -- as compact as they can be, really, and this is useful because we pass them on the C and jq stacks a lot. There are some issues with jvs though, mostly the size of the size and offset fields, which are way too small and have bad overflow semantics, but even for fixing those I'd be reluctant to change the size of a jv.

nicowilliams avatar Jan 28 '17 21:01 nicowilliams

Another thing is that having multiple number representations at once in any one jv seems like asking for trouble? Better just have one.

nicowilliams avatar Jan 28 '17 21:01 nicowilliams

Another thing is that having multiple number representations at once in any one jv seems like asking for trouble? Better just have one.

I like a lot bitwise operations on integers, but I liked also XSLT in the past, and XSLT has only one kind of "numbers": let jq do the same?

Speaking of XSLT, how about implementing translate?

JJOR

fadado avatar Jan 28 '17 22:01 fadado

BTW, your submission is awesome. I'm running with it and modifying it to make the int value a part of the u union in the jv, but please don't feel bad about that.

nicowilliams avatar Jan 29 '17 01:01 nicowilliams

@fadado jq will only have one kind of number. The idea here is that when the input was an integer that fits in an int64_t then that should be the internal representation and when printed it should be printed the way a printf function would do it (hmm, though we need to be careful to not have any internationalization, e.g., thousands separators!), which is to say: with no exponent, no decimal.

nicowilliams avatar Jan 29 '17 01:01 nicowilliams

@dequis Checkout #1327. What do you think?

nicowilliams avatar Jan 29 '17 01:01 nicowilliams