wagtail_textract icon indicating copy to clipboard operation
wagtail_textract copied to clipboard

Lots of database connections created by transcribe_documents?

Open kaedroho opened this issue 5 years ago • 0 comments

Just had a quick browse of the code and noticed that it uses asyncio to create background threads which fetch/extract text from documents.

~Is it likely that Django would start handling the next request before the background thread has finished running? Because if the same database connection is used by both the text extraction and the new request at the same time, this could cause issues as database connections are not thread safe.~

EDIT: looks like Django has this covered: https://github.com/django/django/blob/master/django/db/utils.py#L142

This might cause another issue: Async IO uses a thread pool of 5 * num_cpus by default which might create too many connections for some users (eg, on shared hosting) so maybe we should add a "concurrency" parameter to the "transcribe_documents" command which allows the user to specify a limit on the number of worker threads? (you can specify this in run_in_executor).

kaedroho avatar Jul 23 '18 10:07 kaedroho