paperless
paperless copied to clipboard
Database locked while cosnumer runs
First of all: Thanks for the nice project!
I've got the project running on a Raspberry Pi 3 B+ (so basically a toaster). This means that the consumer takes a looong time to consume the PDFs wich per se is no problem for me. However I noticed that the consumer seems to lock the database while it's processing PDFs. So I can't edit already consumed documents while there are any left in the consumption dir. I get a 500
(OperationalError: database is locked
) when I try to save any model while the consumer is working.
Is this necessary or could the consumer close/unlock the database connection until ocr, guesswork, and stuff is done?
https://github.com/the-paperless-project/paperless/blob/master/src/documents/consumer.py#L115 this seems to be the code in question.
Thanks for bringing this up! I have run into this as well. I'm fairly sure this has not always been the case and I used to edit documents whilst some were still being consumed.
I've noticed this too and this makes paperless unusable if you're doing something like an initial ingestion of a boatload of PDFs and you also want to managed them in the UI...
On Mon, Jun 3, 2019 at 3:36 AM Rouven Bauer [email protected] wrote:
First of all: Thanks for the nice project!
I've got the project running on a Raspberry Pi 3 B+ (so basically a toaster). This means that the consumer take a looong time to consume the PDFs wich per se is no problem for me. However I noticed that the consumer seems to lock the database while it's processing PDFs. So I can't edit already consumed documents while there are any left in the consumption dir. I get a 500 (OperationalError: database is locked) when I try to save any model while the consumer is working.
Is this necessary or could the consumer close/unlock the database connection until ocr and guesswork is done?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/the-paperless-project/paperless/issues/546?email_source=notifications&email_token=AAM5SMU2AW2D5RL2ERJMNL3PYTX23A5CNFSM4HSGXWG2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GXH7FRQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAM5SMTMGG3G7GK2A7W35XLPYTX23ANCNFSM4HSGXWGQ .
The issue seems to be specific to SQLite https://docs.djangoproject.com/en/2.2/ref/databases/#database-is-locked-errors
There is only one database write in the try_consume_file function which happens at the end of the consumption process https://github.com/the-paperless-project/paperless/blob/8e6d7cba1329c959659cd133522e30cda1ae3943/src/documents/consumer.py#L154
Is the transaction.atomic decorator necessary if your don't have document_consumption_started or document_consumption_finished signal handlers that write to the database?
Is it possible to have a setting which toggles atomic transactions that default to True then have try_consume_file_atomic
if CONSUME_FILE_ATOMIC:
result = self.try_consume_file_atomic(file):
else:
result = self.try_consume_file(file):
@transaction.atomic
def try_consume_file_atomic(self, file):
self.try_consume_file(file)
def try_consume_file(self, file):
Would it help to just decorate the _store
method with the @transaction.atomic
decorator instead of the whole try_consume_file
consumption method? Then I'd think it would not lock the database for the whole consumption but only for actually writing the consumed file to the database.
I'd expect that there would be only one instance of the consumer running in any case.
Isn't the @transaction.atomic only necessary if you are writing to the database in the signal handlers
so that the is a set of DB operations that may be rolled back?
The _store
method is only one database operation so doesn't need the @transaction.atomic
Isn't the @transaction.atomic only necessary if you are writing to the database in the signal handlers so that the is a set of DB operations that may be rolled back?
My experience with databases and django is practically zero, so no idea. My suggestion was just based on the lock being possibly too broad and that a lot of time in the initial consumption in try_consume_file
seems unrelated to the database.
I have to preprocess my documents with OCR, because of this. Then the consumer doesn't run as long and the database lock is shorter.
Thinking through this the third time now, there doesn't seem to be a good way to properly handle this while keeping the same assumption: that anything connected to the pre- or post-signals may change the database and if that pre- or post-signal fails, we should roll back everything.
I'm in favor of changing the behavior and move @transaction.atomic
to _store()
instead of try_consume_file()
. There doesn't seem to be anything happening in the DB before the pre-signal. We have to make sure all the steps in _store()
are atomic, as that's saving all the document's data. In the default pre- and post-signalhandlers, we shell out into another process. Since we run in a transaction, no other process should be able to change the same table we do (especially for sqlite DBs).
Another nasty side-effect of this: while the consumer is doing its work you cannot log into the web interface.
Is there anything which can be done regarding the immensely long runtime? I remember that in my "old" installation using mariadb/mysql, I never experienced the long runtime nor the database locking. I imagine that the database locking wouldn't be much of a problem if the runtime wasn't so long.
@stueja It could be related to the new tesseract version that was activated (v3->v4) https://github.com/the-paperless-project/paperless/commit/3050ff159466e873e3542e898e76848d6aaae3e6#diff-3254677a7917c6c01f55212f86c57fbf It uses neural networks and I noticed a decrease in performance on my system. But generally OCR is expensive and takes a while. Not much you can do...
Please open a new issue if you are having performance issues. This issue is about the database lock being too broad and this causing usability issues, because the database is locked longer than need be.