ypy-websocket Improve stores

Changes the stores to have one instance handling multiple documents instead of instantiating one store per document.

Sep 29 '23 12:09 hbcarlos

Can you explain why this is needed?

Sep 29 '23 13:09 davidbrochart

We need more control over the documents that are stored. For example, check if the document exists before trying to load it from the store, remove documents if every client leaves the room or we need to reset the content of the room, list the documents, and get their updates to create a history and revert to that point, etc.

I don't think the stores should be making decisions about what to do with the document, the room or the server should make those decisions.

Sep 29 '23 13:09 hbcarlos

I may be missing something, but I don't understand why stores shouldn't be independent. Actually, I think they should even live in a separate package. That way we could use them for other transport layers than WebSockets.

check if the document exists before trying to load it from the store

The store is created from a document, so an existing store ensures the document exists.

remove documents if every client leaves the room

We could add a method to the current YStores, to remove all updates of a document.

list the documents

That should be at the WebSocket server IMO, not the stores.

get their updates to create a history and revert to that point

I don't see why it's not currently possible.

Sep 29 '23 13:09 davidbrochart

list the documents

That should be at the WebSocket server IMO, not the stores.

That kind of feature is useful for maintenance or debug tooling. It makes sense to provide an API to introspect stored documents outside of the collaboration context.

I'm trying to compare the two approaches to get a balance of pros and cons; here is a starting point:

Manager of YStores

Current code encapsulated in a store manager.

class StoreManager(Mapping[str, YStore]):

     def __init__(self, store_factory):
         self._factory = store_factory

     def list(self):
         return self.keys()
  
     def __get(self, path):
         # Create store if it does not exist
         # start get called here I guess?
         return store;

     def delete(self, path):
         # Destroy/stop the store ?
         self.__get(path).delete()
        
     def write(self, path, data):
         self.__get(path).write(data)

     def read(self):
         return self.__get(path).read(data)
         

class BaseYStore:

     @abstractmethod
     def write(self, data):
         pass
    
     @abstractmethod
     def read(self):
         pass

      # Option for start and stop who should be responsible for this?

# Create storage manager
def factory(...):
     return MyStore(...)

manager = StoreManager(store_factory=factory)

To avoid a risk of inconsistency the document store should not be accessed directly as otherwise the caller can temper with the lifecycle (start and stop) of each document store. And it is harder to control that lifecycle as caller can keep reference of a store they should not.

Question: Are YStore stateful? If not - what I think it should -, what is the advantage of keeping in memory a YStore per document that is a stateless actor to carry out IO operations?

YStoreManager

This proposal

class BaseStoreManager:

     async def list(self):
         return self.keys()

     async def delete(self, path):
         # Destroy/stop the store

     @abstractmethod
     async def write(self, path, data):
         pass
     
     @abstractmethod
     async def read(self, path):
         pass

     # Start and stop could be handled by the
     # object initiating the store manager or 
     # within the object.

# Create storage manager
manager = MyStoreManager(...)

One advantage I see with this proposal is a simpler API.

Oct 04 '23 13:10 fcollonval

The advantage I see with a "Manager of YStores" is that it can manage heterogeneous YStores (for instance a mix of FileYStore and SQLiteYStore), while it seems that the "YStoreManager" only manages YStores of the same type. But correct me if I'm wrong.

Oct 04 '23 14:10 davidbrochart

To avoid a risk of inconsistency the document store should not be accessed directly

This constraint leads to duplicating the Store API in the StoreManager, therefore I agree that it complicates the design. Unless we need to use different stores in the same StoreManager, or we relax this constraint, the second solution looks better (the Store hierarchy class becomes an implementation detail for the end user, so let's keep it simple).

Oct 05 '23 07:10 JohanMabille

The advantage I see with a "Manager of YStores" is that it can manage heterogeneous YStores (for instance a mix of FileYStore and SQLiteYStore), while it seems that the "YStoreManager" only manages YStores of the same type. But correct me if I'm wrong.

You could easily achieve it with a multiplexer manager as done for example by some people for the jupyter server content manager.

Oct 05 '23 07:10 fcollonval

I agree that it's better to do simple things easily, and more complicated things with more effort, so let's go with the "YStoreManager" solution. Thinking more about it, this idea of a YStore managing multiple documents was there at the beginning anyway. That's why for instance a FileYStore is not useful on its own, only when used e.g. in a TempFileYStore to write files in a common directory. An SQLiteYStore also uses a common backend (a DB).

Oct 05 '23 07:10 davidbrochart

Thanks, I'll take a closer look soon.

Oct 06 '23 08:10 davidbrochart

ypy-websocket ypy-websocket copied to clipboard

Improve stores

ypy-websocket
ypy-websocket copied to clipboard