keyvi
keyvi copied to clipboard
IntDictionaryCompiler should throw error for negative values
The interface of the IntDictionaryCompiler.Add
allows all long
values:
Add(self, libcpp_utf8_string, long_int)
But doesn't work with negative integers:
In [5]: compiler = IntDictionaryCompiler()
In [5]: compiler.Add('a', 1)
In [6]: compiler.Add('b', -11)
In [7]: compiler.Compile(); compiler.WriteToFile(path)
In [8]: dictionary = Dictionary(path)
In [9]: dictionary.get('a').GetValue()
Out[9]: 1
In [10]: dictionary.get('b').GetValue()
Out[10]: 18446744073709551605
Instead the compiler should've thrown an error on seeing negative values.
@ylogx thanks for reporting the issue, I've labeled this as a bug
and hopefully we can get this handled soon.
I had a look at this: I think the problem starts with the name, its a valid assumption that "int" is signed. This is not the case, it has always been uint64
and for backwards compatibility reasons I would not change the implementation to support signed int. If we want signed int, we should create a new value store implementation.
I suggest:
- rename
IntValuestore
toUInt64ValueStore
, provide deprecated aliases for full backwards compatibility to the old IntValueStore, IntDictionaryCompiler, etc. - (optional) create a Int64ValueStore for signed ints (<- @ylogx do you have a use case for this?)
@narekgharibyan WDYT?
@hendrikmuhs I really liked the idea of renaming IntValuestore
-> UInt64ValueStore
, same goes for IntDictionaryCompiler
-> UInt64DictionaryCompiler
.
Regarding backward compatibility and aliases: I think we can be fine by not providing them and just changing next release version to 0.5.0
. Also we can drop support for python 3.5
(it reached end of life in September) and add 3.9
instead.
I did have a use case for storing ids that could be negative for one facet and positive for another facet of the dataset. After understanding the implementation, I was forced to use the String Dictionary/Store, it'll be good to have a Int64Valuestore
. Huge +1 for renaming the ValueStore and Dictionary to UInt64.*
.