pg_timetable icon indicating copy to clipboard operation
pg_timetable copied to clipboard

Add `--cluster` mode

Open pashagolub opened this issue 4 years ago • 4 comments

Add support for multiple databases into pg_timetable service in the following way:   When connected to system database (postgres) or when --cluster argument is used then use pg_database (or some granted view above it for security) to find all databases. Then it will try to connect to each database and if it finds timetable functionality there, it will start handling timetable jobs for that database. The list of databases should be regularly refreshed or there should be some command to do it explicitly.  

  1. pg_timetable connects to any of databases (for simplicity, we can even use pg_template1 for this purpose in --cluster)
  2. pg_timetable checks the presence of the timetable schema in every database
  3. if timetable schema is present, that means database considered a target, start the main loop over it

pashagolub avatar Sep 02 '21 12:09 pashagolub

After thorough investigations, it seems to add --cluster mode to the pg_timetable is overkill. The code becomes overcomplicated, managing newly created/dropped databases and schemas in them is cumbersome.

One sane approach would be to create a new pg_timetable_cluster application that would be in charge of creating/closing new pg_timetable instances.

Another promising solution would be to handle the schedulers fleet using the functionality of the pg_timetable itself. The model should look like this in my head:

  1. We have a special database in the PostgreSQL cluster that will serve as a template for every database in the cluster, e.g. CREATE DATABASE template
  2. We init this template database for scheduling, e.g. pg_timetable -d template --init --client-name=foo
  3. Every other database is created based on the template, e.g. CREATE DATABASE new TEMPLATE template.
  4. This way each new database will contain a timetable schema after creation.
  5. Now we will add a chain to the template database that will:
  • check active database pg_timetable sessions
  • join them with databases that exist and find for what databases there is no active scheduling session
  • add missing database names as arguments to the next PROGRAM task
  • PROGRAM task will launch new pg_timetable instances for missing databases

The only missing point right now is how to stop pg_timetable if one wants to drop the database. I will address this issue by creating proper functionality working on #322

pashagolub avatar Oct 01 '21 07:10 pashagolub

HI @pashagolub ,

I just discovered pg_timetable and I miss this functionnality. My use case is that I have several microservices which own their own database. I would like to have only one instance of pg_timetable to handle all those databases.

I would suggest that in configuration file, we could also change the connection properties to an connections array. This way pg_timetable will have a clear vision of which databases it has to handle.

What do you think ?

Thanks,

Anthony

anthony2856 avatar Jun 29 '22 09:06 anthony2856

hey,

sounds doable. The main question is how to handle application schema. Should all databases have a timetable schema with chains or only one?

If all of them then pg_timetable should check the schema on every start and react in some way.

pashagolub avatar Jun 29 '22 12:06 pashagolub

I think it's better to let each database own his proper timetable schema. Less rework in the code I suppose and also easy to manage database deletion for example.

anthony2856 avatar Jun 29 '22 12:06 anthony2856

📅 This issue has been automatically marked as stale because lack of recent activity. It will be closed if no further activity occurs. ♻️ If you think there is new information allowing us to address the issue, please reopen it and provide us with updated details. 🤝 Thank you for your contributions.

github-actions[bot] avatar Mar 30 '23 00:03 github-actions[bot]