luigi
luigi copied to clipboard
Broken or too many connection to database when workers > 1
Hi team, many thanks for the great work.
I am using Luigi to schedule a workflow, where some tasks can be run in parallel.
Hence, I would like to have multiple workers to parallelise computation, resulting in faster completion.
However, I am experiencing issue with the connection to my database when enabling multiple workers, whereas there is no issues with a single workers.
For reference, I am using SQLAlchemy 2.0.25 with a PostgreSQL database.
The typical error arising is: (psycopg2.OperationalError) SSL SYSCALL error: EOF detected
which typically happens when the connection to the database was closed unexpectedly, according to ref.
As I understood that enabling multiple workers leads to creating/forking threads, I read that I must dispose the database connection as per the documentation.
Hence, I used Luigi events callback to dispose the database connection at the start of each of the task:
@luigi.Task.event_handler(luigi.Event.START) # type: ignore
def on_start(self) -> None: # pragma: no cover
engine.dispose(close=False)
Unfortunately, such approach leads to (psycopg2.OperationalError) FATAL: sorry, too many clients already
error.
As per my understanding, I should execute this database engine disposal a single time at the creation of worker, not at the start of each task.
Given the above context, my questions are:
- Can someone share guidance on how to enable multiple workers with SQLAlchemy and PostgreSQL?
- Is it possible to avoid the creation a dedicated engine/ database connection per workers to avoid the too many clients error?
- Is there a way to execute a custom script at the creation of worker?
Many thanks for the help.
Similar question but for Django: https://github.com/spotify/luigi/issues/2782