ZODB
ZODB copied to clipboard
Multiple indexed views for object collection?
Hello ZODB team
My use case is rather simple:
I have some BTree collections of objects keyed by some unique integer key (i.e IOBTree
collections of Persistent
objects). My application is not really search-centric, but in these very few cases (few, but frequently happening) that a lookup has to be carried-out by another (non-key) field, i have to iterate the entire container sequentially.
Is it possible to maintain some secondary index on a certain BTree collection? I do not mean directly (i.e on the same tree) but maybe maintain an auxiliary value->{set-of-keys} mapping?
I have implemented some Collection class on top of BTree, something like:
tree = root['users'] = IOBTree()
ix1 = root['users.by_name'] = OOBTree()
users = Collection(tree, name=ix1) # declare primary tree and indices
# The collection should follow underlying BTree api as much as possible
u1 = User(id=12, name='Foo')
users.insert(u1) # also inserts to name-based index
...
# The collection should return proxy objects on search operations
p1 = users.get(12) # returns a proxy (to u1) that intercepts __setattr__, __delattr__
p1.name # delegate to u1.name
p1.name = 'Baz' # delegate to u1, but also update name-based index
Of course, my implementation has several limitations and is only a workaround for my use case. Maybe some more mature solutions already exist ? or maybe this is a reason to migrate to a relational model ?
"Catalog" is the usual term in ZODB-land (because a catalog contains one or more indexes). These aren't provided by ZODB itself, but there are packages that provide them on top of ZODB. E.g. one of those is repoze.catalog.
I haven't used catalogs much personally, so I cannot give recommendations.
Good answer. Thanks Marius.
To elaborate a bit, catalogs are objects that maintain multiple indexes on a collection of objects. They provide for querying on one or more among multiple indexes. When querying multiple indexes. They rely on some low-level machinery provided by BTrees for doing set operations on results from searching multiple indexes.
Jim
On Thu, Feb 11, 2016 at 8:13 AM, Marius Gedminas [email protected] wrote:
"Catalog" is the usual term in ZODB-land (because a catalog contains one or more indexes). These aren't provided by ZODB itself, but there are packages that provide them on top of ZODB. E.g. one of those is repoze.catalog http://docs.repoze.org/catalog/overview.html.
I haven't used catalogs much personally, so I cannot give recommendations.
— Reply to this email directly or view it on GitHub https://github.com/zopefoundation/ZODB/issues/44#issuecomment-182858077.
Jim Fulton http://jimfulton.info
@jimfulton , @mgedmin Thank you for your quick answer.
I didn't know about repoze.catalog (in fact i thought that repoze had only to do with authn/authz middleware).
I played a bit with it, and it seems that Catalog
objects can be persisted into a ZODB database. It also seems that the catalog doesn't own the indexed objects (and so cannot retrieve by docid), so an external mapping (i.e a DocumentMap
) must be maintained side-by-side and also be persisted into the database.
Am i correct? Is the following example valid?
class User(Persistent):
pass # Implement __cmp__, carry a `name` attribute
# Populate our containers
user_catalog = Catalog()
user_catalog['name'] = CatalogFieldIndex('name')
user_map = DocumentMap()
u1 = User('Totos')
u2 = User('Foo')
user_catalog.index_doc(1, u1)
user_map.add(u1, 1)
user_catalog.index_doc(2, u2)
user_map.add(u2, 2)
# Commit to database
db = DB(FileStorage('users.zodb'))
conn = db.open()
root = conn.root()
root['user_map'] = user_map
root['user_catalog'] = user_catalog
transaction.commit()
conn.close
My main concern lies at this warning: http://docs.repoze.org/catalog/usage.html#restrictions.
But maybe this is resolved by subclassing objects directly from object
??
You might want to look at souper. Its a single data structure that keeps internal indexes so you can search it. https://pypi.python.org/pypi/souper
On Fri, 12 Feb 2016 6:41 pm MichailAlexakis [email protected] wrote:
@jimfulton https://github.com/jimfulton , @mgedmin https://github.com/mgedmin Thank you for your quick answer.
I didn't know about repoze.catalog (in fact i thought that repoze had only to do with authn/authz middleware).
I played a bit with it, and it seems that Catalog objects can be persisted into a ZODB database. It also seems that the catalog doesn't own the indexed objects (and so cannot retrieve by docid), so an external mapping (i.e a DocumentMap) must be maintained side-by-side and also be persisted into the database.
Am i correct? Is the following example valid?
class User(Persistent): pass # Implement cmp, carry a
name
attributePopulate our containers
user_catalog = Catalog() user_catalog['name'] = CatalogFieldIndex('name') user_map = DocumentMap()
u1 = User('Totos') u2 = User('Foo')
user_catalog.index_doc(1, u1) user_map.add(u1, 1) user_catalog.index_doc(2, u2) user_map.add(u2, 2)
Commit to database
db = DB(FileStorage('users.zodb')) conn = db.open() root = conn.root()
root['user_map'] = user_map root['user_catalog'] = user_catalog transaction.commit()
conn.close
My main concern lies at this warning: http://docs.repoze.org/catalog/usage.html#restrictions. But maybe this is resolved by subclassing objects directly from object ??
— Reply to this email directly or view it on GitHub https://github.com/zopefoundation/ZODB/issues/44#issuecomment-183290165.