persistent icon indicating copy to clipboard operation
persistent copied to clipboard

Shared read-only state between objects with copy on write

Open jimfulton opened this issue 7 years ago • 5 comments

There's a lot of interest in using ZODB with asynchronous frameworks, especially for applications that block on network requests to services. From a purely programming perspective, gevent makes this quite tractable, but the cost of maintaining many open ZODB connections with their own caches is a major challenge. The cost of maintaining many open connections could be mitigated if data could be shared among their caches.

One way to do this would be to have a shared state cache of read-only state objects. Consider the extremely common case of persistent objects that store their data in dictionaries (and leaving aside non-persistent subobjects, for the sake of discussion). Set-state for such objects could simply assign the instance dictionary to the state. First assigning an attribute to such an object could copy the state dict first. This would allow use of shared immutable state dicts, requiring no copying for read-only operations. Note that in this scenario, only state is shared, not persistent objects.

You could use slots, or secondary dictionaries for non-shared mutable state.

Similar schemes could be used for BTrees and Buckets, although we'd need to introduce new Python subobjects to represent shared state.

To make this work, we'd likely want to create persistent subobjects that disallowed storing non-persistent mutable subobjects, which would have other benefits.

jimfulton avatar Sep 26 '18 13:09 jimfulton

This is somewhat similar to RelStorage's in-memory pickle state cache, which is shared by all Connections of a Storage, but operating on the unpickled data (and then of course copying it). I like the idea!

A challenge there is making such a shared cache effective with the different MVCC states that each Connection may be seeing. RelStorage has a complicated system of "checkpoints" it uses to accomplish this that works OK for short-lived transactions and Connections that don't drift too far apart from each other in terms of their MVCC state.

jamadden avatar Sep 26 '18 13:09 jamadden

This cache would be keyed by oid + serial, so it would be orthogonal to MVCC. It would store Python objects, so there would be no additional deserialization overhead. Because the sharing would be at the object level, there would be memory savings, not just savings in loading object objects.

jimfulton avatar Sep 26 '18 13:09 jimfulton

If we could store non-dicts as __dict__, then we could use immutable dicts as shared state and trigger copy on failed setitem (or on noticing non-dicts), requiring no change to persistent state metadata.

jimfulton avatar Sep 26 '18 13:09 jimfulton

This cache would be keyed by oid + serial, so it would be orthogonal to MVCC.

Ah, I see. It helps that the current laughingly-misnamed "pickle cache" knows what (oid, serial) values it's going to be requesting; the RelStorage case just has to deal with arbitrary requests over time.

jamadden avatar Sep 26 '18 13:09 jamadden

Nice idea!

davisagli avatar Sep 27 '18 02:09 davisagli