pg_timetable icon indicating copy to clipboard operation
pg_timetable copied to clipboard

Refactor to an extension

Open uded opened this issue 5 years ago • 13 comments

I was wondering if there is any chance to refactor this to an extension. There are several Golang psql extensions already, so - at least theoretically - that seems feasible. I haven't checked the code myself yet, but since there is a pg_cron extension, why not moving this to a more closely-coupled form?

If there is a good argument for not doing that, before I will spend hours analyzing the code, please do share with me :-D

uded avatar Mar 27 '20 20:03 uded

hey there. Can you provide us with examples of such extensions, please?

pashagolub avatar Mar 28 '20 19:03 pashagolub

Possibly I can do better - here is a framework that allows one to create a Golang extension for PSQL. I made a simple test and it seems working fine, albeit it is just the code provided by the author. Theoretically, this should allow you to port your code to the extension format, although - again - I have not analyzed the code and I will not vouch for that...

All the extension I have seen seems to be based on that framework. I can't find a lot of code right now (middle of the night), but I spoke with the guys in our company on Friday and they pointed me out to the project above for hints. As they noted the biggest problem they faced was with goroutines as the GC in Golang was causing issues with PSQL (if I got all the information right over the video call in the times of pandemic)...

Hence, some info here and here - with C import, some example code here - working fine. PRest had a bgworker implementation in the making/testing. but I am uncertain about the status right now.

uded avatar Mar 30 '20 00:03 uded

Thanks for the info! Sounds cool. The only thing disturbs me a lot:

Using goroutines is possible, but very tricky. The allocation of the stack for the goroutine is bigger than max_stack_depth. Running an procedure that spins-up some goroutines ends with crashing

And we are heavily on goroutines. Anyway, we'll' check this out. Thanks

pashagolub avatar Mar 30 '20 09:03 pashagolub

however, if you will just run scheduler.Run() that might work. All the logic is inside this function.

I'm extremely interested if that can work. :)

pashagolub avatar Mar 30 '20 09:03 pashagolub

I think it is worth trying. I would do it myself, but due to the COVID-19 we are swamped! If you can migrate the code to be a psql extension then you're gaining a lot of in terms of deployments.

However, what about master-master and master-slave replication. In such case I think it would be good to use pg_cron logic:

To start the pg_cron background worker when PostgreSQL starts, you need to add pg_cron to shared_preload_libraries in postgresql.conf. Note that pg_cron does not run any jobs as a long a server is in hot standby mode, but it automatically starts when the server is promoted.

uded avatar Mar 30 '20 15:03 uded

We use this rule:

if pg_timetable can write to its system tables, we do run tasks.

Sometimes you need some tasks for standby node as well, not only for leader

pashagolub avatar Mar 30 '20 17:03 pashagolub

Yeah, got it. I started to go through your code in a little of the spare time I have. Looks good, it should be possible to port to an extension but needs testing. A lot of goroutines - which is great - but not sure how psql will handle it...

Also maybe removing some of the built-in tasks might be a good idea as the deployment model would change. But far from that now I guess, just thinking...

uded avatar Mar 30 '20 19:03 uded

We use this rule:

if pg_timetable can write to its system tables, we do run tasks.

Sometimes you need some tasks for standby node as well, not only for leader

I agree - it would be actually nice to use slaves for computing power, as - usually - those are not heavily loaded. But if you use master-slave all updates, actual work results plus processed data, etc. should go directly to the master regardless of where the job was executed. That would require proper connection management I believe. It would be great if I would be able to select that a task is non-data crucial and it's just executing based on the existing parameters, but in M-S status updates of the job itself still should be written to the master, am I right?

Busy times, no progress for me with the code reading

uded avatar Apr 04 '20 10:04 uded

But if you use master-slave all updates, actual work results plus processed data, etc. should go directly to the master regardless of where the job was executed.

That's why we have ability to execute tasks using foreign connection :)

pashagolub avatar Apr 04 '20 15:04 pashagolub

OK, great. If you need some help with the experiment - don't hesitate to ask. Next week should be a little less mad for me (hopefully)...

uded avatar Apr 05 '20 20:04 uded

@pashagolub - need help with this? We got about in-house 20 extensions made till now, works really well for us!

uded avatar Sep 05 '20 13:09 uded

hey, we're experimenting with that, but, sure, any help is appreciated! :) You mean 20 golang extensions? Have you any in public?

pashagolub avatar Sep 05 '20 13:09 pashagolub

No public - mostly for very specific uses for tailored-made apps for customers. But since we started we use them quite extensively as they made hell lot of sense in RDBMS world. Most of them are custom data handling and some validators/processors. Works like a charm, speeded up some of the use cases by 200-300%.

Just say what you need - me and maybe some guys from our team will be happy to jump in and help.

uded avatar Sep 05 '20 15:09 uded

📅 This issue has been automatically marked as stale because lack of recent activity. It will be closed if no further activity occurs. ♻️ If you think there is new information allowing us to address the issue, please reopen it and provide us with updated details. 🤝 Thank you for your contributions.

github-actions[bot] avatar Mar 30 '23 00:03 github-actions[bot]