pgcopydb Proposal for allowing increment of table jobs during clone

The current functionality of pgcopydb allows for the migration of multiple tables. However, in scenarios where the resource utilization is low and there are many tables to be migrated, it would be useful to have the ability to increase the number of table jobs being executed at once. This would improve migration performance and efficiency.

Additionally, when multiple databases are being migrated, it may be beneficial to "steal" jobs from one database and assign them to the next one on completion, ensuring that a constant number of jobs are always running. This would optimize the use of resources and reduce the overall migration time.

Therefore, I propose adding a feature that allows for the increment of table jobs during clone and the ability to dynamically allocate jobs between multiple databases.

Mar 30 '23 06:03 shubhamdhama

Although the main idea is quite enough attractive (at least for me), the self-managed system, there are couple of things you should to consider before implementing this proposal.

What parameters should be considered as resource utilization - is it a CPU load, RAM load, IOPS, network traffic, something other, all together, some kind of derivative index, or something other?
Resources utilization is the subject of (very) often changes and these changes may vary - f.e. CPU utilization may change from 0% to 60%, from 70% to 2% etc in seconds etc. If software f.e. will see that (f.e. we will use some integral derivative resources utilization index) resources utilization is low, fork a couple of child migration processes, and then resources utilization increased drastically. What should do this tool? Kill some child processes? Wait for them to complete (this waiting may last long)? Other case?
You should consider resources utilization not only on the host which performs migration, but a database too (especially source DB, which may be used in production while migration). To avoid the case when migration tool will detect very low resources utilization, fork (this is just an example) hundreds of child processes, and DDoS the source database.

Apr 04 '23 09:04 lospejos

See #545 where the ability to have internal catalogs array that spill-to-disk should allow for registering the catalogs for many databases at once and then driving a run for a new command such as pgcopydb clone --all-databases ; where resource sharing would be done across all databases.

Nov 20 '23 15:11 dimitri

IIRC we don't plan to implement this in favor of --all-databases. Feel free to reopen if that's not true.

Jun 24 '24 05:06 shubhamdhama