tiled icon indicating copy to clipboard operation
tiled copied to clipboard

SQLite Error

Open dylanmcreynolds opened this issue 2 years ago • 7 comments

v0.1.0a105

We have a setup where we have 4 tiled pods running in k8s, and have started getting occasional SQLite locked errors.

  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 70, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 108, in __call__
    response = await self.dispatch_func(request, call_next)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/tiled/server/app.py", line 721, in set_cookies
    response = await call_next(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 84, in call_next
    raise app_exc
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 70, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 108, in __call__
    response = await self.dispatch_func(request, call_next)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/tiled/server/app.py", line 712, in client_compatibility_check
    response = await call_next(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 84, in call_next
    raise app_exc
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 70, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 108, in __call__
    response = await self.dispatch_func(request, call_next)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/tiled/server/app.py", line 672, in double_submit_cookie_csrf_protection
    response = await call_next(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 84, in call_next
    raise app_exc
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 70, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "/opt/venv/lib/python3.11/site-packages/tiled/server/compression.py", line 27, in __call__
    await responder(scope, receive, send)
  File "/opt/venv/lib/python3.11/site-packages/tiled/server/compression.py", line 48, in __call__
    await self.app(scope, receive, self.send_compressed)
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/cors.py", line 83, in __call__
    await self.app(scope, receive, send)
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/opt/venv/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/opt/venv/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/fastapi/routing.py", line 273, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/fastapi/routing.py", line 190, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/tiled/server/router.py", line 315, in metadata
    resource = await construct_resource(
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/tiled/server/core.py", line 460, in construct_resource
    count = await len_or_approx(entry)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/tiled/server/core.py", line 70, in len_or_approx
    return await tree.async_len()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/tiled/catalog/adapter.py", line 324, in async_len
    return (await db.execute(statement)).scalar_one()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/tiled/catalog/explain.py", line 80, in execute
    return await super().execute(statement, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/ext/asyncio/session.py", line 454, in execute
    result = await greenlet_spawn(
             ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 190, in greenlet_spawn
    result = context.throw(*sys.exc_info())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2262, in execute
    return self._execute_internal(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2144, in _execute_internal
    result: Result[Any] = compile_state_cls.orm_execute_statement(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/orm/context.py", line 293, in orm_execute_statement
    result = conn.execute(
             ^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1412, in execute
    return meth(
           ^^^^^
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 515, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1635, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1844, in _execute_context
    return self._exec_single_context(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1984, in _exec_single_context
    self._handle_dbapi_exception(
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2339, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
    self.dialect.do_execute(
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
    cursor.execute(statement, parameters)
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 113, in execute
    self._adapt_connection._handle_exception(error)
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 263, in _handle_exception
    raise error
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 95, in execute
    self.await_(_cursor.execute(operation, parameters))
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 125, in await_only
    return current.driver.switch(awaitable)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 185, in greenlet_spawn
    value = await result
            ^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/aiosqlite/cursor.py", line 48, in execute
    await self._execute(self._cursor.execute, sql, parameters)
  File "/opt/venv/lib/python3.11/site-packages/aiosqlite/cursor.py", line 40, in _execute
    return await self._conn._execute(fn, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/aiosqlite/core.py", line 133, in _execute
    return await future
           ^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/aiosqlite/core.py", line 106, in run
    result = function()
             ^^^^^^^^^^
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked
[SQL: SELECT count(nodes."key") AS count_1 
FROM nodes 
WHERE nodes.ancestors = ?]
[parameters: ('[]',)]

dylanmcreynolds avatar Sep 13 '23 23:09 dylanmcreynolds

What type of filesystem is the SQLite file residing on?

danielballan avatar Sep 13 '23 23:09 danielballan

It's the NERSC CFS, so it's not NFS.

dylanmcreynolds avatar Sep 14 '23 15:09 dylanmcreynolds

We have scaled it down to one process. It will be interesting to see if this goes away. If we need to scale up, we can switch to postgres.

dylanmcreynolds avatar Sep 14 '23 15:09 dylanmcreynolds

Yes, I think the right mindset is "SQLite is multi-process, but single-node." Any networked file system will be at best unusably slow and at worst corrupt. Definitely re-open if you see this again with a single node.

danielballan avatar Sep 14 '23 16:09 danielballan

To be clear, we were in a setup with multiple pods in the same cluster, and a volume mount that I wouldn't. This was the whole point of changing the tiled container image to be single process. It will be interesting to see what our guidance ends up being.

dylanmcreynolds avatar Sep 14 '23 16:09 dylanmcreynolds

Can you rephrase that first sentence? I can’t parse.

Maybe I should reopen this, awaiting your guidance on these kinds of deployments.

Aside: what kind of filesystem is CFS? I thought is was NFS.

danielballan avatar Sep 14 '23 16:09 danielballan

Aside: I answered my own question. Indeed, not NFS, but any networked FS will be “no warranty” at best, and should be avoided. https://docs.nersc.gov/filesystems/community/#performance

danielballan avatar Sep 14 '23 16:09 danielballan