sqlitedict icon indicating copy to clipboard operation
sqlitedict copied to clipboard

Feature Proposal: Multiple simultaneous dictionaries with one db file

Open wkatsak opened this issue 8 years ago • 7 comments

Hello,

I use sqlitedict quite extensively in my research code, and overall find it a wonderful piece of software. Recently however, I discovered an issue that I had to create an interesting workaround for, and was thinking that this might be useful to others as well, if made into an official feature.

Use case: I often use sqlitedict to store results from experiments (time series or just simple attributes). My structure might look like this:

A single dbfile with:

  • Attribute table for each experiment, with attribute keys underneath
  • Timeseries table for each experiment/timeseries attribute, with timestamp keys inside.
  • Metadata table to list experiments, and some mappings to attributes.

This works great with a small number of experiments, but when this grows to thousands, it is hard to work with the data because it is necessary to instantiate a sqlitedict for each dbfile/table combo.

Now, for my proposal:

I realized that it is possible to just switch out the .tablename attribute for an already open sqlitedict, and have it work just fine (the connection is already open, etc just need to change the table for the queries). This cut a recent processing time from ~5000s to 250s.

It would be cool to have an alternate interface that works like this:

db = sqlitedict.SqliteDictDb(dbfilename)
dictionary_foo = db.get_dict("foo")
dictionary_bar = db.get_dict("bar")

The dictionary interfaces themselves could be a lightweight wrapper that maintains a reference to the db object. These could share the same thread, connection, etc. and allow for much faster access to a variety of tables.

If this seems reasonable, I am more than happy to implement it and submit a pull request. If not, thats ok too, I'll just keep it in my personal library. I just figured I would ask in case there is interest, and if there is any preference for how it is done.

Thank you! -Bill Katsak

wkatsak avatar Jul 12 '16 13:07 wkatsak

It can be tricky due to file locking in sqlite3 https://www.sqlite.org/lockingv3.html

ownport avatar Jul 12 '16 20:07 ownport

I am imagining that all of these "sub-dicts" would serialize queries through the same parent/thread. Wouldn't that prevent any potential issue?

wkatsak avatar Jul 12 '16 22:07 wkatsak

class defaultdict(collections.defaultdict):
    'like defaultdict but default_factory receives the key'

    def __missing__(self, key):
        self[key] = value = self.default_factory(key)
        return value


dicts = defaultdict(lambda table: SqliteDict('dicts.sqlite3', table, 'c', True))
dicts['dict1']['x'] = 1
dicts['dict2']['y'] = 2

Digenis avatar Jul 13 '16 08:07 Digenis

This doesn't actually help, if I understand correctly. You are still creating a fresh SqliteDict object for each table. I want to avoid that.

Right now, I can do this: d = SqliteDict('dicts.sqlite3', 'table1', 'c', True) d['x'] = 1 d.tablename = 'table2' d['y'] = 2

Assuming of course, that table1 and table2 are already created. I'm using this hack on a read only basis to speed up getting the data back out. My proposal is to add an (optional?) view/mode that allows us to have multiple dicts share a single sqlitedict object.

wkatsak avatar Jul 13 '16 17:07 wkatsak

collections.defaultdict

Can you pythonic explain this ?

instasck avatar Nov 02 '19 14:11 instasck

Why not nest your dicts inside a dict?

Your proposal:

db = sqlitedict.SqliteDictDb(dbfilename)
dictionary_foo = db.get_dict("foo")
dictionary_bar = db.get_dict("bar")

Nested alternative:

db = sqlitedict.SqliteDictDb(dbfilename)
dictionary_foo = db['foo']
dictionary_bar = db['bar']

I apologize in advance if I am missing something, back after a long break.

mpenkov avatar Dec 29 '19 13:12 mpenkov

@mpenkov one problem I see with that approach is that you would need to pickle the entire dictionary every time you change any Key/Value inside the nested dictionary. With @wkatsak's approach, you should be able to change a key/value pair in dictionary_foo and only pickle the value you're changing rather than pickling the entirety of dictionary_foo.

jonchun avatar Mar 10 '20 16:03 jonchun