witnet-rust icon indicating copy to clipboard operation
witnet-rust copied to clipboard

Operator parseJSONMap fails on JSON objects containing long integer values

Open guidiaz opened this issue 1 year ago • 10 comments

For instance: {"data": 122637770461000000000}

guidiaz avatar Nov 17 '22 10:11 guidiaz

Alright, so this issue seems to be because of CBOR limitations. When executing an operator like StringParseJsonMap, the data is converted first from JSON to CBOR, and then from CBOR to RadonType:

JSON -> CBOR -> RadonType

A RadonInteger has a limit of -2^127 to 2^127 - 1, but CBOR has a lower limit for the integer size, from -2^64 to 2^64 - 1. So the conversion results in an error:

Encode { from: "cbor::value::Value", to: "RadonMap" }

That issue can be fixed by skipping the intermediate CBOR conversion, but then a bigger problem arises: RadonTypes are serialized as CBOR when used in the network protocol. Therefore, as soon as a RadonInteger reaches 2^64, it will be an error to serialize it. For example, this is an error that is seen in the node when a data request uses the IntegerMultiply operator to convert an input of [1] into an integer greater than 2^64 (1 * 1000000000000000 * 100000):

Couldn't decode tally value from bytes: Failed to encode RadonInteger into Vec<u8>

(the error says tally value but this is the result of the aggregation phase)

In that case the data request resolves with 0 commits, because it is not possible to serialize the result. Sheikah doesn't detect that error when using "try data request", so I guess the toolkit will not notice this error as well.

So possible solutions for this general integer serialization issue:

  • reduce the range of valid RadonIntegers to the same as valid CBOR integers
  • serialize big integers as CBOR floats
  • serialize big integers using some CBOR extension

I believe all of them are breaking changes.

tmpolaczyk avatar Nov 18 '22 09:11 tmpolaczyk

On StringParseJsonArray, StringParseJsonMap and StringParseXmlMap, I would suggest to:

  • serialize big integers (>64 bits) to RadonFloats, so "absolute", "greaterThan", "lessThan", "modulo", "multiply", "negate" and "power" math operators could still be used.

guidiaz avatar Nov 18 '22 09:11 guidiaz

In that case the data request resolves with 0 commits, because it is not possible to serialize the result. Sheikah doesn't detect that error when using "try data request", so I guess the toolkit will not notice this error as well.

@tmpolaczyk Shouldn't the node commit something like an "aggregation overflow error" instead?

As a matter of fact, there's a specific enum value for this situation defined in the Witnet.sol library.

guidiaz avatar Nov 18 '22 09:11 guidiaz

@tmpolaczyk Shouldn't the node commit something like an "aggregation overflow error" instead?

As a matter of fact, there's a specific enum value for this situation defined in the Witnet.sol library.

Yeah, it should, but this error has never happened before so the current behavior is to not commit anything. Basically here instead of returning an err we should return an ok with serialization of a dedicated radon error like RadError::Encode:

https://github.com/witnet/witnet-rust/blob/2fa9a2fdae4501967d739d9884aaaef1882b02da/node/src/actors/chain_manager/mining.rs#L592

tmpolaczyk avatar Nov 18 '22 10:11 tmpolaczyk

Would it make sense to rephrase the log error message to "Couldn't encode value from < whatever > into < RadonType >: {}" ?

guidiaz avatar Nov 18 '22 10:11 guidiaz

Would it make sense to rephrase the log error message to "Couldn't encode value from < whatever > to < RadonType >: {}" ?

Yes, that log is wrong.

tmpolaczyk avatar Nov 18 '22 10:11 tmpolaczyk

Opened #2312 to track the issue of nodes not commiting anything in case of encode error.

tmpolaczyk avatar Nov 21 '22 14:11 tmpolaczyk

The CBOR RFC seems to indicate that big integers can be supported by representing them as bytes or strings:

https://www.rfc-editor.org/rfc/rfc8949.html#name-bignums

But not sure if this solution works well for our case, because:

  • Depending on the context, the bytes should be interpreted as ether bytes or as a big integer. So radon operators such as ArrayReduce(AverageMean) would need to implicitly convert any bytes to integer, and maybe in some cases that is not desired.
  • Negative integers are supposed to be represented by strings, but not all integers are valid utf8 strings. The two CBOR libraries that we use (cbor and serde_cbor) do not allow non-utf8 strings. So we would need to use a different CBOR library that allows using non-utf8 strings.

tmpolaczyk avatar Nov 24 '22 12:11 tmpolaczyk

* Negative integers are supposed to be represented by strings, but not all integers are valid utf8 strings. The two CBOR libraries that we use (cbor and serde_cbor) do not allow non-utf8 strings. So we would need to use a different CBOR library that allows using non-utf8 strings.

Now I'm curious of which integer numbers cannot be formatted as strings :thinking:

aesedepece avatar Nov 28 '22 09:11 aesedepece

Now I'm curious of which integer numbers cannot be formatted as strings thinking

I guess most of them, for example -100000000000000000000 is encoded as

C3                       # tag(3)
   49                    # bytes(9)
      056BC75E2D630FFFFF # "\u0005k\xC7^-c\u000F\xFF\xFF"

And "\u0005k\xC7^-c\u000F\xFF\xFF" is not valid UTF8, so we wouldn't be able to parse it with the current libraries.

tmpolaczyk avatar Nov 29 '22 13:11 tmpolaczyk