java-client icon indicating copy to clipboard operation
java-client copied to clipboard

Weaviate int needs to be represented as java Long instead of Double

Open samos123 opened this issue 3 years ago • 4 comments

In Weaviate the int datatype is represented as int64 however in the Java client it gets converted to a java.lang.Double which is an 8 byte structure to represent floating points.

I suspect if we were to store a value of 9223372036854775807 or -9223372036854775807 in Weaviate of datatype int that the java client wouldn't be able to get the correct value.

samos123 avatar Nov 21 '22 20:11 samos123

It's indeed an issue, currently an int stored as 9223372036854775807 will get returned as double value 9.223372036854776E18, thus losing precision because the number became 9223372036854776000 instead of 9223372036854775807.

I created an example demonstrating the issue here: https://github.com/samos123/weaviate-java-example/commit/e8506e04b804f6811906badffddc13007a748fc7

It seems it's not able to store the Article wordCount correctly either when using Java client with a long value:

curl http://localhost:8080/v1/objects | jq ".objects[0]"
{
  "class": "Article",
  "creationTimeUnix": 1669064501700,
  "id": "43e43a84-1370-4b21-8796-545c7d222382",
  "lastUpdateTimeUnix": 1669064501700,
  "properties": {
    "content": "Sam",
    "title": "Sam",
    "wordCount": 9223372036854776000
  },
  "vectorWeights": null
}

samos123 avatar Nov 21 '22 20:11 samos123

I'm starting to wonder maybe the Weaviate docs are incorrect that it uses int64:

curl -X POST -H 'Content-Type: application/json' -d '{
      "class": "Article",
      "properties": {
          "title": "Large int64",
          "wordCount": 92233720368547758079
      }
  }' http://localhost:8080/v1/objects
{"error":[{"message":"invalid object: invalid integer property 'wordCount' on class 'Article': the JSON number '92233720368547758079' could not be converted to an int"}]}

samos123 avatar Nov 21 '22 22:11 samos123

Does Weaviate int datatype support 64 byte numbers? It feels like support for int64 is flaky so maybe we should just only support int32 in the Java client.

Relevant issue: https://github.com/semi-technologies/weaviate/issues/1563

The docs do mention this: (*) Although Weaviate supports int64, GraphQL currently only supports int32, and does not support int64. This means that currently integer data fields in Weaviate with integer values larger than int32, will not be returned using GraphQL queries. We are working on solving this issue. As current workaround is to use a string instead. however I'm not using GraphQL to store the object, I'm using the REST API.

samos123 avatar Nov 22 '22 03:11 samos123

Thank @samos123 for reporting the issue. You are right, Double is not suitable for holding int64 numbers like given 9223372036854775807 (as it is rounded to 9223372036854776000) and Long would fit in that case better. Unmarshaller used in the client is not aware of property type, so for each number-like value Double is used as the most versatile. This works well for smaller numbers, so for majority of usecases, but fails for big ones like in your example. We need to figure out how that can be fixed.

As for relevant issue: weaviate is also affected with rounding problem and should be fixed independently: https://github.com/semi-technologies/weaviate/issues/1563#issuecomment-1326851613

aliszka avatar Nov 24 '22 22:11 aliszka