jupyter-collaboration
jupyter-collaboration copied to clipboard
Time-to-live support not working anymore?
For notebooks that create a lot of output the .jupyter_ystore.db
can get rather large and unfortunately overflows our users' quota easily.
In this example here
import time
for i in range(100_000_000):
print(f"{i}, ", end="")
time.sleep(0.05)
the size of the notebook file after a runtime of a few minutes grows to 1.5 MB. On the other hand the corresponding .jupyter_ystore.db
grows to 580 MB.
I have read parts of the discussions around the database and I have a rough understanding of the the complications that makes solving this quite challenging. For now the time-to-live option seems like a suitable workaround to limit the growth to some extend. At the moment this does not seem to work (anymore?) though.
When starting a new session with
jupyter lab --SQLiteYStore.document_ttl=600
I only receive the following error messages:
[I 2024-05-17 16:08:01.659 ServerApp] Creating new notebook in
[I 2024-05-17 16:08:01.722 ServerApp] Request for Y document 'Untitled10.ipynb' with room ID: 780564de-e0da-492a-9d14-af545441c896
[I 2024-05-17 16:08:01.913 YDocExtension] Creating FileLoader for: Untitled10.ipynb
[I 2024-05-17 16:08:01.914 YDocExtension] Watching file: Untitled10.ipynb
[I 2024-05-17 16:08:01.915 ServerApp] Initializing room json:notebook:780564de-e0da-492a-9d14-af545441c896
[I 2024-05-17 16:08:01.935 ServerApp] Content in room json:notebook:780564de-e0da-492a-9d14-af545441c896 loaded from file Untitled10.ipynb
[E 2024-05-17 16:08:01.937 ServerApp] Error initializing: Untitled10.ipynb
TypeError("'>' not supported between instances of 'int' and 'DeferredConfigString'")
Traceback (most recent call last):
File "C:\tools\miniconda3\envs\data\Lib\site-packages\jupyter_collaboration\handlers.py", line 233, in open
await self.room.initialize()
File "C:\tools\miniconda3\envs\data\Lib\site-packages\jupyter_collaboration\rooms.py", line 151, in initialize
await self.ystore.encode_state_as_update(self.ydoc)
File "C:\tools\miniconda3\envs\data\Lib\site-packages\pycrdt_websocket\ystore.py", line 145, in encode_state_as_update
await self.write(update)
File "C:\tools\miniconda3\envs\data\Lib\site-packages\pycrdt_websocket\ystore.py", line 473, in write
if self.document_ttl is not None and diff > self.document_ttl:
^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'int' and 'DeferredConfigString'
[I 2024-05-17 16:08:01.940 ServerApp] Deleting Y document from memory: json:notebook:780564de-e0da-492a-9d14-af545441c896
[I 2024-05-17 16:08:01.940 ServerApp] Room json:notebook:780564de-e0da-492a-9d14-af545441c896 deleted
[I 2024-05-17 16:08:01.941 ServerApp] Deleting file Untitled10.ipynb
[E 2024-05-17 16:08:01.943 ServerApp] Exception in callback functools.partial(<function WebSocketProtocol._run_callback.<locals>.<lambda> at 0x0000023308FE4A40>, <Task finished name='Task-734' coro=<YDocWebSocketHandler.on_message() done, defined at C:\tools\miniconda3\envs\data\Lib\site-packages\jupyter_collaboration\handlers.py:277> exception=AttributeError("'YDocWebSocketHandler' object has no attribute 'room'")>)
Traceback (most recent call last):
File "C:\tools\miniconda3\envs\data\Lib\site-packages\tornado\ioloop.py", line 750, in _run_callback
ret = callback()
^^^^^^^^^^
File "C:\tools\miniconda3\envs\data\Lib\site-packages\tornado\websocket.py", line 640, in <lambda>
self.stream.io_loop.add_future(result, lambda f: f.result())
^^^^^^^^^^
File "C:\tools\miniconda3\envs\data\Lib\site-packages\jupyter_collaboration\handlers.py", line 286, in on_message
changes = self.room.awareness.get_changes(message[1:])
^^^^^^^^^
AttributeError: 'YDocWebSocketHandler' object has no attribute 'room'
[E 2024-05-17 16:08:01.945 ServerApp] Uncaught exception GET /api/collaboration/room/json:notebook:780564de-e0da-492a-9d14-af545441c896?sessionId=19a409eb-52ee-46c9-9d32-d39d007e0a9a (::1)
HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/api/collaboration/room/json:notebook:780564de-e0da-492a-9d14-af545441c896?sessionId=19a409eb-52ee-46c9-9d32-d39d007e0a9a', version='HTTP/1.1', remote_ip='::1')
Traceback (most recent call last):
File "C:\tools\miniconda3\envs\data\Lib\site-packages\tornado\web.py", line 1790, in _execute
result = await result
^^^^^^^^^^^^
File "C:\tools\miniconda3\envs\data\Lib\site-packages\jupyter_collaboration\handlers.py", line 209, in get
return await super().get(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tools\miniconda3\envs\data\Lib\site-packages\tornado\websocket.py", line 273, in get
await self.ws_connection.accept_connection(self)
File "C:\tools\miniconda3\envs\data\Lib\site-packages\tornado\websocket.py", line 863, in accept_connection
await self._accept_connection(handler)
File "C:\tools\miniconda3\envs\data\Lib\site-packages\tornado\websocket.py", line 946, in _accept_connection
await self._receive_frame_loop()
File "C:\tools\miniconda3\envs\data\Lib\site-packages\tornado\websocket.py", line 1102, in _receive_frame_loop
await self._receive_frame()
File "C:\tools\miniconda3\envs\data\Lib\site-packages\tornado\websocket.py", line 1193, in _receive_frame
await handled_future
AttributeError: 'YDocWebSocketHandler' object has no attribute 'room'
Traceback (most recent call last):
File "C:\tools\miniconda3\envs\data\Lib\collections\__init__.py", line 449, in _make
result = tuple_new(cls, iterable)
^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: pycrdt::map::Map is unsendable, but is being dropped on another thread
Traceback (most recent call last):
File "C:\tools\miniconda3\envs\data\Lib\collections\__init__.py", line 449, in _make
result = tuple_new(cls, iterable)
^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: pycrdt::map::Map is unsendable, but is being dropped on another thread
Traceback (most recent call last):
File "C:\tools\miniconda3\envs\data\Lib\collections\__init__.py", line 449, in _make
result = tuple_new(cls, iterable)
^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: pycrdt::doc::Doc is unsendable, but is being dropped on another thread
Traceback (most recent call last):
File "C:\tools\miniconda3\envs\data\Lib\collections\__init__.py", line 449, in _make
result = tuple_new(cls, iterable)
^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: pycrdt::array::Array is unsendable, but is being dropped on another thread
Is the the ttl
-option still supported or is there another or better way to limit the size of the database?