sqlitedict
sqlitedict copied to clipboard
Added the ability (and tests of) integer, float, tuple, and frozenset keys
closes #73
Thanks for the PR, I have 2 main questions:
- How this change affects to performance (memory usage, speed)?
- What's about other types (that didn't mentioned here)?
Performance should be trivially affected. For string/bytes keys (all that currently work) the only overhead is a few conditionals. For the new key types, there is the slight overhead of json
but that is only incurred for the new key types and again is very small and likely not even measurable compared to other aspects of the code.
As for other types, I do not have a good answer. While there are a few exceptions, any hashable python object can be a key but would require manual implementation to handle them. One option would be to instead pickle the object and compare the keys that way. I think this introduces even more overhead. Also, should the user wish to look at the sqlite file elsewhere, the keys would be imperceptible (In my use of sqlitedict
, I use a modified json encoder with pickle fallback so the dict remains human readable). ~~Still, this may be a valid approach.~~ update: a quick test shows this pickle-based approach may falter. While the resulting object between say (u'a',u'b')
on python 2 and 3 pass equality checks, the pickled form, including base64 encoding too, is not equal. None of this changes whether this approach is good for the mentioned types of objects (I still think it is) but it precludes reliably using pickles as keys /update
I will address a few more issues in the discussion of #73
Ping @Jwink3101 are you able to finish this PR?
This might be part of a more significant issue, a user treating an SqliteDict
as a normal dictionary will expect it to work like a normal python dictionary: any hashable object should be able to be used as a key, and will not be mutated as it's added to the dict.
Currently, keys are mutated:
In [1]: from sqlitedict import SqliteDict
In [2]: db = SqliteDict("/nobackup/tmp/db1.sqlite")
In [3]: key, value = 1, 1
In [4]: db[key] = value
In [5]: list(db.keys())
Out[5]: ['1']
In [6]: list(db.values())
Out[6]: [1]
This seems to occur because the default encode_key
is the identity but sqlite turns it into a string. Why isn't the pickle/base64 based encode/decode_key the default?
The problem with using python’s hash
or pickle is that they are not cross platform or cross versions. It’s a tough problem!
My code is one option but another is just education.