datasketches-cpp icon indicating copy to clipboard operation
datasketches-cpp copied to clipboard

Theta sketch output doesn't match between Java and CPP

Open nmahadevuni opened this issue 6 months ago • 5 comments

I have run a simple theta sketch computation as below in Java and CPP

Union union = Union.builder().buildUnion();
       union.update(1);
       union.update(2);
       CompactSketch compactSketch = union.getResult();
byte[] bytes = union.getResult().toByteArray()

Java output:

02 03 03 00 00 1a cc 93 02 00 00 00 00 00 80 3f 15 f9 7d cb bd 86 a1 05 c3 97 fc 12 81 70 9d 1e

CPP :

updateThetaSketch update_sketch = updateThetaSketch::builder().build();

  update_sketch.update(1);
  update_sketch.update(2);
  auto bytes = update_sketch.compact().serialize();

CPP output:

02 03 03 00 00 1a cc 93 02 00 00 00 00 00 00 00
15 f9 7d cb bd 86 a1 05 c3 97 fc 12 81 70 9d 1e

The output seems to not match, as we see in bold at the end of first line, 4 bytes "80 3f" is missing in CPP.

Can anyone share why this is so?

nmahadevuni avatar Jul 01 '25 17:07 nmahadevuni