keyvi icon indicating copy to clipboard operation
keyvi copied to clipboard

IntDictionaryCompiler should throw error for negative values

Open ylogx opened this issue 4 years ago • 4 comments

The interface of the IntDictionaryCompiler.Add allows all long values:

Add(self, libcpp_utf8_string, long_int)

But doesn't work with negative integers:

In [5]: compiler = IntDictionaryCompiler()
In [5]: compiler.Add('a', 1)
In [6]: compiler.Add('b', -11)
In [7]: compiler.Compile(); compiler.WriteToFile(path)

In [8]: dictionary = Dictionary(path)

In [9]: dictionary.get('a').GetValue()
Out[9]: 1

In [10]: dictionary.get('b').GetValue()
Out[10]: 18446744073709551605

Instead the compiler should've thrown an error on seeing negative values.

ylogx avatar Jun 26 '20 03:06 ylogx

@ylogx thanks for reporting the issue, I've labeled this as a bug and hopefully we can get this handled soon.

narekgharibyan avatar Jun 28 '20 09:06 narekgharibyan

I had a look at this: I think the problem starts with the name, its a valid assumption that "int" is signed. This is not the case, it has always been uint64 and for backwards compatibility reasons I would not change the implementation to support signed int. If we want signed int, we should create a new value store implementation.

I suggest:

  • rename IntValuestore to UInt64ValueStore, provide deprecated aliases for full backwards compatibility to the old IntValueStore, IntDictionaryCompiler, etc.
  • (optional) create a Int64ValueStore for signed ints (<- @ylogx do you have a use case for this?)

@narekgharibyan WDYT?

hendrikmuhs avatar Nov 07 '20 13:11 hendrikmuhs

@hendrikmuhs I really liked the idea of renaming IntValuestore -> UInt64ValueStore, same goes for IntDictionaryCompiler -> UInt64DictionaryCompiler.

Regarding backward compatibility and aliases: I think we can be fine by not providing them and just changing next release version to 0.5.0. Also we can drop support for python 3.5 (it reached end of life in September) and add 3.9 instead.

narekgharibyan avatar Nov 07 '20 14:11 narekgharibyan

I did have a use case for storing ids that could be negative for one facet and positive for another facet of the dataset. After understanding the implementation, I was forced to use the String Dictionary/Store, it'll be good to have a Int64Valuestore. Huge +1 for renaming the ValueStore and Dictionary to UInt64.*.

ylogx avatar Feb 09 '21 22:02 ylogx