parsec-cloud
parsec-cloud copied to clipboard
FSLocalStorageOperationalError on disk full
Sentry Issue: PARSEC-QZW
OperationalError: disk I/O error
File "parsec/core/fs/storage/local_database.py", line 74, in _manage_operational_error
yield
File "parsec/core/fs/storage/local_database.py", line 122, in _create_connection
self._conn.execute("PRAGMA journal_mode=WAL")
FSLocalStorageOperationalError:
(16 additional frame(s) were not displayed)
...
File "parsec/core/fs/storage/local_database.py", line 39, in run
await self._connect()
File "parsec/core/fs/storage/local_database.py", line 130, in _connect
await self._create_connection()
File "parsec/core/fs/storage/local_database.py", line 123, in _create_connection
self._conn.execute("PRAGMA synchronous=NORMAL")
File "async_generator/_util.py", line 53, in __aexit__
await self._agen.athrow(type, value, traceback)
File "parsec/core/fs/storage/local_database.py", line 102, in _manage_operational_error
raise FSLocalStorageOperationalError from exception
Uncatched error
related to #2083
Sentry issue: PARSEC-QZX
Sentry issue: PARSEC-R04
Here's a typical scenario related to this issue:
During a file synchronization, an access to the local database fails with an OperationalError
. This might happen in set_clean_block
due to the disk being full. OperationErrors
are handle in a way to immediately clode the connection to the local data base in order to avoid a potential data corruption. The database being closed, an FSLocalStorageClosedError
is generated in another nursery task that was running concurrently. A MultiError
is then raised and logged which is good, because we want to know about those combined exceptions. This exception bubbles up until it closes the backend connection. The user gets a notification and checks the status of the synchronizing file, which fails again since the local database has been closed.
My conclusion is that everything mostly happen as we wanted so far. Note that when those errors happen during a file import (which is the most probable case) we're already prompting the user with a message asking them to check their disk space. So it's probable that the user:
- created a large file
- reached disk capacity
- got the prompt about checking disk space
- the new file still got synchronized (partially)
- then scenario described above happenned
Another probable case for filling up the disk is opening large files that have been shared by another user.
In conclusion, things mostly go as we planned but there are clearly things that can be improved. The following questions need to be answered:
- Should we prompt the user asking to check disk space in other cases than file import?
- Should we have a better control on the re-connection to the local database?
- Should we really close the backend connection during a local storage error?
A potential way of dealing with those issues is to log out the user when a failing local storage is detected, and check disk space when a user tries to login in order to prevent those issues in the first place.