objecthash
objecthash copied to clipboard
Language independent hashing mechanism for float and integers (Proposal)
Depending on what programming language is used to implement ObjectHash there is quite a difference in behavior and the resulting hash. One big issue I see is the distinguishment between a float and an integer in the case of integer-valued floats.
An example (taken from the test cases) is:
(1) ["foo", {"bar":["baz", null, 1, 1.5, 0.0001, 1000, 2, -23.1234, 2]}]
-and-
(2) ["foo", {"bar":["baz", null, 1.0, 1.5, 0.0001, 1000.0, 2.0, -23.1234, 2.0]}]
In Python the results are: (1) 726e7ae9e3fadf8a2228bf33e505a63df8db1638fa4f21429673d387dbd1c52a -and- (2) 783a423b094307bcb28d005bc2f026ff44204442ef3513585e7e73b66e3c2213
The Go implementation introduced a CommonJSON object using the Go marshalling function to address this issue:
json.Marshal(o)
I would like to suggest a different solution which is language independent by following the JSON Schema proposal in:
http://json-schema.org/draft-04/json-schema-core.html#rfc.section.5.5:
It is acknowledged by this specification that some programming languages, and their associated parsers, use different internal representations for floating point numbers and integers, while others do not.
As a consequence, for interoperability reasons, JSON values used in the context of JSON Schema, whether that JSON be a JSON Schema or an instance, SHOULD ensure that mathematical integers be represented as integers as defined by this specification.
In my opinion this can be simply achieved by adding a case differentiation:
case Type.Float:
{
if ((float)val % 1.0 == 0.0)
{
HashInt((int)val);
} else
{
HashFloat((float)val);
}
break;
}
It can be discussed if it is useful to exclude zero from that case distinction by adding:
(float)val % 1.0 == 0.0 && (float)val != 0.0
In my opinion it would be real great for the ObjectHash project to have a common understanding about this issue and for all implementations to follow the recommendation.
@weigandf That's already addressed in the README
Regarding your actual proposal, there are 3 major issues:
- it's backwards incompatible, i.e. the hash of some objects will change, it requires changing existing implementations;
- it's not implementable in constant time, even when the schema and layout of the object are known;
-
x mod 1 == 0
is a poor test of integerness, as it is subject to floating-point rounding effects that may be platform dependent, i.e. some platforms round down subnormal numbers to 0, or may use a different bitwidth for theirfloat
type (f32
vs.f64
, ...).
Hello @KellerFuchs, first thanks for your answer.
Please consider my comments:
- Compatibility is a big issue and that is why I wrote this issue. Unfortunately if you have a look at the other implementations of ObjectHash (like Java, Go, Python, ...) you see that there is no consistent implementation for integer-valued floats. (see the Python example from the issue description)
- I agree with you that testing for an integer with
(float)val % 1.0 == 0.0
is a poor test even when it was meant like this(float)val % 1.0 < ɛ
. But still the issue to define ɛ is depending on language specific implementations and on the float type (as you said). So I agree with you that this is not a good (because too difficult) solution either. - I agree with the README and your comment that it would be better to introduce a function to generate a common JSON before hashing it. (as done in the Go reference implementation.) It would just be great to have a clear definition of this function (as there is currently none or I am not able to find it). Using the Go json.Marshal(o) looks like a black box to me which is quite hard to implement in other languages.
Currently I use this function:
case Type.Integer:
{
if (Settings.COMMON_JSONIFY)
{
HashFloat((float)value);
}
else
{
HashInt((int)value);
}
break;
}
Not sure if this is enough to fully cover the json.Marshal(o) function of Go for the use case of integer-valued floats.