chest icon indicating copy to clipboard operation
chest copied to clipboard

Using hash value as a key will create a temporary key that does not keep it's value

Open grimnight opened this issue 11 years ago • 7 comments

Using original keys hash value as a key will create a new key, but only until flush(). After flush() the key will still exist in the .keys file, but not in the filesystem and it will return the value of the original key.

❯❯❯ python
Python 2.7.8 (default, Aug 24 2014, 21:26:19)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from chest.core import Chest
>>> import json
>>> c = Chest(path='path', dump=json.dump, load=json.load)
>>> c['1/2'] = '1/2'
>>> c['1/2']
'1/2'
>>> str(abs(hash('1/2')))
'163512108431620421'
>>> c[163512108431620421] = 163512108431620421
>>> c[163512108431620421]
163512108431620421
>>> c['1/2']
'1/2'
>>> c.keys()
[163512108431620421, '1/2']
>>> c.flush()
>>> c = Chest(path='path', dump=json.dump, load=json.load)
>>> c.keys()
[u'1/2', 163512108431620421]
>>> c['1/2']
u'1/2'
>>> c[163512108431620421]
u'1/2'
>>> exit()
❯❯❯ ls -la path
total 16
drwxr-xr-x  6 grimnight staff  204 Jou  1 09:46 .
drwxr-xr-x 67 grimnight staff 2278 Jou  1 09:43 ..
-rw-r--r--  1 grimnight staff   37 Jou  1 09:46 .keys
-rw-r--r--  1 grimnight staff    5 Jou  1 09:46 163512108431620421
❯❯❯ cat path/.keys
[163512108431620421, "1/2"]

As an alternative to str(abs(hash(''))) consider quote_plus or something similar. After seeing 19 and multi_key_dict I thought of c[('some', 1.0, 'key')] or c['some', 1.0, 'key'] becoming path/some/1.0/key, assuming that is possible or even a good idea.

grimnight avatar Dec 03 '14 14:12 grimnight

Using str and some quoting mechanism sound like a decent alternative to using the hash. Ideally one would be able to prove that the function was one-to-one. If you're interested in doing this work one only needs to change the key_to_filename function.

Also, thanks for pointing me to shove, I wasn't aware of it before. Perhaps chest is unnecessary.

mrocklin avatar Dec 03 '14 15:12 mrocklin

Would hashlib.sha256('key').hexdigest() be a good alternative? The reason I liked chest was it's simplicity compared to shove. Chest has a simple structure that can be used from any programming language (assuming JSON). The possibility of using it from the commandline with cli interface or using jq and echo key | sha256sum.

Would cli interface be a good idea?

What did you think of the multi key option?

grimnight avatar Dec 03 '14 16:12 grimnight

Using sha256 seems like a decent choice. If we find a collision we probably have other things to worry about :)

A command line interface sounds great.

Multi-key is an interesting idea. My gut feeling is that it's nice to have a system that doesn't do anything fancy and then build up fancy things on top of that. I wonder if there is a clean way to subclass or encapsulate Chest to easily achieve multi-key in a separate object.

mrocklin avatar Dec 03 '14 16:12 mrocklin

How about making key_to_filename pluggable? The only requirement would be that it returns a valid filename (or string). That way it would be possible to test alternative behaviors quickly. Having key_to_filename pluggable might require that .keys file becomes dictionary instead of a list.

That being said, should directories be expressly denied, or is there a way to make them work. Should multi-key be another class or could it be pluggable too (see tinydb and how it supports extensible behavior).

grimnight avatar Dec 03 '14 16:12 grimnight

Ah, pushing this problem down to making key_to_filename pluggable sounds like a good idea. Actually, this function already is pluggable

class Chest(object):
    def __init__(self, data=None, path=None, available_memory=None,
                 dump=partial(pickle.dump, protocol=pickle.HIGHEST_PROTOCOL),
                 load=pickle.load,
                 key_to_filename=key_to_filename,
                 mode='b'):

Don't see why directories should be denied. The key_to_filename function might have to be responsible for the side effect of making mid-level directories though (unless we can find some nicer version of os.mkdir.)

What isn't currently pluggable is __getitem__ which you would presumably need for multi-key like functionality.

Why would .keys need to become a dictionary? Although I suppose it does depend on the key_to_filename function. Maybe that should be stored using a specific serialization scheme (like pickle).

mrocklin avatar Dec 03 '14 16:12 mrocklin

I hadn't noticed that key_to_filename is already pluggable, my bad.

A subclass of Chest might be the way to go with __getitem__ for multi-key access if that is a possibility. If os.mkdir is used in key_to_filename to create the directories how do you remove them, should there be some kind of list of directories still in use?

My usecase for chest includes cli (or shell) use and quickly checking .keys would be helpful, but I can simply set json in my scripts and use grep or jq. A cli interface would be nice too.

In the past the closest things I have seen to what I have been looking for were tik and cob, they are far from ideal, tik required compiling a heavy dependency and it was a binary store and cob could end up with an empty json file.

grimnight avatar Dec 03 '14 17:12 grimnight

Oh, I forgot about key_to_filename myself, so no worries.

Multi-key probably also requires changes to things like __contains__ and some thought on what .keys() and .values() should return. I was thinking about how we could make all of these things pluggable for a while but then decided that I was probably just reimplementing inheritance.

Setting the .keys file to be a dictionary seems fine to me. Perhaps even preferable. Are you able/willing to make this change? It'd also be good to get someone else poking around in the code.

A CLI wrapper sounds like a nice idea. I suspect it'd have to be paired with a particular serialization scheme.

In general I threw up chest as a sort of proof of concept of a component to something larger that I might want to build in a month. I'm not likely to do significant innovation here for a couple weeks or so. I'd love to see someone else hop on though.

mrocklin avatar Dec 03 '14 17:12 mrocklin