ypy-websocket
ypy-websocket copied to clipboard
Improve stores
Changes the stores to have one instance handling multiple documents instead of instantiating one store per document.
Can you explain why this is needed?
We need more control over the documents that are stored. For example, check if the document exists before trying to load it from the store, remove documents if every client leaves the room or we need to reset the content of the room, list the documents, and get their updates to create a history and revert to that point, etc.
I don't think the stores should be making decisions about what to do with the document, the room or the server should make those decisions.
I may be missing something, but I don't understand why stores shouldn't be independent. Actually, I think they should even live in a separate package. That way we could use them for other transport layers than WebSockets.
check if the document exists before trying to load it from the store
The store is created from a document, so an existing store ensures the document exists.
remove documents if every client leaves the room
We could add a method to the current YStores, to remove all updates of a document.
list the documents
That should be at the WebSocket server IMO, not the stores.
get their updates to create a history and revert to that point
I don't see why it's not currently possible.
list the documents
That should be at the WebSocket server IMO, not the stores.
That kind of feature is useful for maintenance or debug tooling. It makes sense to provide an API to introspect stored documents outside of the collaboration context.
I'm trying to compare the two approaches to get a balance of pros and cons; here is a starting point:
Manager of YStores
Current code encapsulated in a store manager.
class StoreManager(Mapping[str, YStore]):
def __init__(self, store_factory):
self._factory = store_factory
def list(self):
return self.keys()
def __get(self, path):
# Create store if it does not exist
# start get called here I guess?
return store;
def delete(self, path):
# Destroy/stop the store ?
self.__get(path).delete()
def write(self, path, data):
self.__get(path).write(data)
def read(self):
return self.__get(path).read(data)
class BaseYStore:
@abstractmethod
def write(self, data):
pass
@abstractmethod
def read(self):
pass
# Option for start and stop who should be responsible for this?
# Create storage manager
def factory(...):
return MyStore(...)
manager = StoreManager(store_factory=factory)
To avoid a risk of inconsistency the document store should not be accessed directly as otherwise the caller can temper with the lifecycle (start and stop) of each document store. And it is harder to control that lifecycle as caller can keep reference of a store they should not.
Question: Are YStore stateful? If not - what I think it should -, what is the advantage of keeping in memory a YStore per document that is a stateless actor to carry out IO operations?
YStoreManager
This proposal
class BaseStoreManager:
async def list(self):
return self.keys()
async def delete(self, path):
# Destroy/stop the store
@abstractmethod
async def write(self, path, data):
pass
@abstractmethod
async def read(self, path):
pass
# Start and stop could be handled by the
# object initiating the store manager or
# within the object.
# Create storage manager
manager = MyStoreManager(...)
One advantage I see with this proposal is a simpler API.
The advantage I see with a "Manager of YStores" is that it can manage heterogeneous YStores (for instance a mix of FileYStore
and SQLiteYStore
), while it seems that the "YStoreManager" only manages YStores of the same type. But correct me if I'm wrong.
To avoid a risk of inconsistency the document store should not be accessed directly
This constraint leads to duplicating the Store API in the StoreManager, therefore I agree that it complicates the design. Unless we need to use different stores in the same StoreManager, or we relax this constraint, the second solution looks better (the Store hierarchy class becomes an implementation detail for the end user, so let's keep it simple).
The advantage I see with a "Manager of YStores" is that it can manage heterogeneous YStores (for instance a mix of
FileYStore
andSQLiteYStore
), while it seems that the "YStoreManager" only manages YStores of the same type. But correct me if I'm wrong.
You could easily achieve it with a multiplexer manager as done for example by some people for the jupyter server content manager.
I agree that it's better to do simple things easily, and more complicated things with more effort, so let's go with the "YStoreManager" solution. Thinking more about it, this idea of a YStore managing multiple documents was there at the beginning anyway. That's why for instance a FileYStore is not useful on its own, only when used e.g. in a TempFileYStore to write files in a common directory. An SQLiteYStore also uses a common backend (a DB).
Thanks, I'll take a closer look soon.