Improve `list table-parts` to list parts for all tables.
[edit] See this commit for fresh description: https://github.com/dimitri/pgcopydb/pull/611/commits/4df109ed5119b06a35a0e2305aeecfb8c8a18861. Keeping below text for completeness of discussion.
- Replace
catalog_init_from_specswithcopydb_fetch_schema_and_prepare_specs. The former works only if the catalog already exists. The latter connects to the source and provides a fresh report. - Correct confusing output where
Maxof last row was(-1,0). Revert to old behavior for clarity. - Limit
copydb_fetch_schema_and_prepare_specsto fetch information only for the specified table and schema. Initialize the filter of specs with the passed single table and schema.
The pgcopydb list ... commands are now in a different place as they were before the SQLite catalogs implementation. Because the catalogs are kept on-disk as a cache that we can re-use from a command to the next, it is now possible to fetch the catalog information first and re-use it multiple times, as exercised by make tests/pagila-multi-steps.
As a result, we register the catalogs setup (source and target URIs, snapshot, filtering, etc) and compare it with the command's context to make sure that we can re-use the on-disk cache when running the next command.
For the pgcopydb list table-parts command, historically it would only work one table at a time, as a way of just debugging pgcopydb split approach. Also it would be best if the command was not preparing the whole of the catalogs fetching thousands of table metadata just to be able to display partitions for one of them. That's why at the moment the pgcopydb list table-parts command requires another catalog listing such as pgcopydb list tables or pgcopydb list schema to have been invoked previously.
Passing down the one-table filtering to copydb_fetch_schema_and_prepare_specs could register that filtering as the catalog setup, and then other commands would refuse to re-use the cache because it was computed for a different filtering setup. That's not good.
We can change command's design, and I have been wondering if the pgcopydb list table-parts command should be enhanced to work on all the tables that we partition at once. It should be easy to implement now that we have the catalogs available and pre-computed in a SQLite file.
Thanks @dimitri for the review -- it totally make sense! I'm working on making this change.
Regarding the catalog initialization using copydb_fetch_schema_and_prepare_specs, it's worth noting that we also update the s_table_parts catalog. Conseqently now --split-tables-larger-than should be passed to list tables. And list table-parts would merely emit the splits calculated previously. If one attempts new values of --split-tables-larger-than with list table-parts would conflict and fail.
In the scenario where a user wants to tune the value of --split-tables-larger-than to observe it's impact, they have to re-run list tables followed by list table-parts. I don't see a problem with this, but I just want to make sure it's intended.
Alternatively we can allow change to --split-tables-larger-than with some --force option in list table-parts and it would recompute the parts and update the catalog.
If we chose to go with update-catalog-only-once case (the former one), I think we should also store ctid as the part-key in s_table catalog wherever applicable so that if we use catalog_lookup_s_table_by_name like we are doing in cli_list_table_parts, we have the correct information about the part-key. This lookup of part-key is used to format output and anyways it should be there for correctness.
What do you think?
My intention is to introduce --disable-split-by-ctid/--disable-ctid-as-partkey after all these changes.
Regarding the catalog initialization using
copydb_fetch_schema_and_prepare_specs, it's worth noting that we also update thes_table_partscatalog. Conseqently now--split-tables-larger-thanshould be passed tolist tables.
Yes.
And
list table-partswould merely emit the splits calculated previously. If one attempts new values of--split-tables-larger-thanwithlist table-partswould conflict and fail.
Maybe we can even discard the --split-tables-larger-than option in pgcopydb list table-parts and only re-use whatever was given to pgcopydb list tables to make it clear that's how it works?
In the scenario where a user wants to tune the value of
--split-tables-larger-thanto observe it's impact, they have to re-runlist tablesfollowed bylist table-parts. I don't see a problem with this, but I just want to make sure it's intended.
Agreed. That's intended. That's how it works at the moment.
Alternatively we can allow change to
--split-tables-larger-thanwith some--forceoption inlist table-partsand it would recompute the parts and update the catalog.
That makes sense, that's a good use of the --force option indeed.
If we chose to go with update-catalog-only-once case (the former one), I think we should also store
ctidas the part-key ins_tablecatalog wherever applicable so that if we usecatalog_lookup_s_table_by_namelike we are doing incli_list_table_parts, we have the correct information about the part-key. This lookup of part-key is used to format output and anyways it should be there for correctness.What do you think?
I was on the fence about updating the catalogs with the ctid partkey information, but I can see how that makes sense for the rest of the code.
My intention is to introduce
--disable-split-by-ctid/--disable-ctid-as-partkeyafter all these changes.
I think I prefer the first spelling of it: --disable-split-by-ctid.
@dimitri I have updated the PR with commit message hopefully explaining what I have done so far. It address most of what we discussed above, here are the immediate follow-ups
- [ ] Throw error if list table-parts is run without running
list schemaorlist tableswith appropriate error. - [ ] Fix
--forceoption. Currently I have updated the doc with the intended behavior as per above discussion.
Independent follow-ups
- [ ] Add
--disable-split-by-ctid - [ ] Find smarter strategy to use
ctidas part-key for only some versions of source server.
Here is how output look now,
pgcopydb list table-parts --split-tables-larger-than '10 MB'
Output
23:00:43.996 27191 INFO Table public.t_geometric_types COPY will be split 33-ways
OID | Schema Name | Table Name | Part | Min | Max | Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
16427 | public | t_geometric_types | 1/33 | 1 | 30304 | 30304
16427 | public | t_geometric_types | 2/33 | 30305 | 60608 | 30304
16427 | public | t_geometric_types | 3/33 | 60609 | 90912 | 30304
16427 | public | t_geometric_types | 4/33 | 90913 | 121216 | 30304
16427 | public | t_geometric_types | 5/33 | 121217 | 151520 | 30304
16427 | public | t_geometric_types | 6/33 | 151521 | 181824 | 30304
16427 | public | t_geometric_types | 7/33 | 181825 | 212128 | 30304
16427 | public | t_geometric_types | 8/33 | 212129 | 242432 | 30304
16427 | public | t_geometric_types | 9/33 | 242433 | 272736 | 30304
16427 | public | t_geometric_types | 10/33 | 272737 | 303040 | 30304
16427 | public | t_geometric_types | 11/33 | 303041 | 333344 | 30304
16427 | public | t_geometric_types | 12/33 | 333345 | 363648 | 30304
16427 | public | t_geometric_types | 13/33 | 363649 | 393952 | 30304
16427 | public | t_geometric_types | 14/33 | 393953 | 424256 | 30304
16427 | public | t_geometric_types | 15/33 | 424257 | 454560 | 30304
16427 | public | t_geometric_types | 16/33 | 454561 | 484864 | 30304
16427 | public | t_geometric_types | 17/33 | 484865 | 515168 | 30304
16427 | public | t_geometric_types | 18/33 | 515169 | 545472 | 30304
16427 | public | t_geometric_types | 19/33 | 545473 | 575776 | 30304
16427 | public | t_geometric_types | 20/33 | 575777 | 606080 | 30304
16427 | public | t_geometric_types | 21/33 | 606081 | 636384 | 30304
16427 | public | t_geometric_types | 22/33 | 636385 | 666688 | 30304
16427 | public | t_geometric_types | 23/33 | 666689 | 696992 | 30304
16427 | public | t_geometric_types | 24/33 | 696993 | 727296 | 30304
16427 | public | t_geometric_types | 25/33 | 727297 | 757600 | 30304
16427 | public | t_geometric_types | 26/33 | 757601 | 787904 | 30304
16427 | public | t_geometric_types | 27/33 | 787905 | 818208 | 30304
16427 | public | t_geometric_types | 28/33 | 818209 | 848512 | 30304
16427 | public | t_geometric_types | 29/33 | 848513 | 878816 | 30304
16427 | public | t_geometric_types | 30/33 | 878817 | 909120 | 30304
16427 | public | t_geometric_types | 31/33 | 909121 | 939424 | 30304
16427 | public | t_geometric_types | 32/33 | 939425 | 969728 | 30304
16427 | public | t_geometric_types | 33/33 | 969729 | 1000001 | 30273
23:00:43.997 27191 INFO Table public.t_range_types COPY will be split 15-ways
OID | Schema Name | Table Name | Part | Min | Max | Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
16493 | public | t_range_types | 1/15 | 1 | 66668 | 66668
16493 | public | t_range_types | 2/15 | 66669 | 133336 | 66668
16493 | public | t_range_types | 3/15 | 133337 | 200004 | 66668
16493 | public | t_range_types | 4/15 | 200005 | 266672 | 66668
16493 | public | t_range_types | 5/15 | 266673 | 333340 | 66668
16493 | public | t_range_types | 6/15 | 333341 | 400008 | 66668
16493 | public | t_range_types | 7/15 | 400009 | 466676 | 66668
16493 | public | t_range_types | 8/15 | 466677 | 533344 | 66668
16493 | public | t_range_types | 9/15 | 533345 | 600012 | 66668
16493 | public | t_range_types | 10/15 | 600013 | 666680 | 66668
16493 | public | t_range_types | 11/15 | 666681 | 733348 | 66668
16493 | public | t_range_types | 12/15 | 733349 | 800016 | 66668
16493 | public | t_range_types | 13/15 | 800017 | 866684 | 66668
16493 | public | t_range_types | 14/15 | 866685 | 933352 | 66668
16493 | public | t_range_types | 15/15 | 933353 | 1000001 | 66649
23:00:43.997 27191 INFO Table public.t_array_types COPY will be split 14-ways
OID | Schema Name | Table Name | Part | Min | Max | Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
16468 | public | t_array_types | 1/14 | 1 | 71430 | 71430
16468 | public | t_array_types | 2/14 | 71431 | 142860 | 71430
16468 | public | t_array_types | 3/14 | 142861 | 214290 | 71430
16468 | public | t_array_types | 4/14 | 214291 | 285720 | 71430
16468 | public | t_array_types | 5/14 | 285721 | 357150 | 71430
16468 | public | t_array_types | 6/14 | 357151 | 428580 | 71430
16468 | public | t_array_types | 7/14 | 428581 | 500010 | 71430
16468 | public | t_array_types | 8/14 | 500011 | 571440 | 71430
16468 | public | t_array_types | 9/14 | 571441 | 642870 | 71430
16468 | public | t_array_types | 10/14 | 642871 | 714300 | 71430
16468 | public | t_array_types | 11/14 | 714301 | 785730 | 71430
16468 | public | t_array_types | 12/14 | 785731 | 857160 | 71430
16468 | public | t_array_types | 13/14 | 857161 | 928590 | 71430
16468 | public | t_array_types | 14/14 | 928591 | 1000001 | 71411
23:00:43.998 27191 INFO Table public.t_character_types COPY will be split 13-ways
OID | Schema Name | Table Name | Part | Min | Max | Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
16408 | public | t_character_types | 1/13 | 1 | 76924 | 76924
16408 | public | t_character_types | 2/13 | 76925 | 153848 | 76924
16408 | public | t_character_types | 3/13 | 153849 | 230772 | 76924
16408 | public | t_character_types | 4/13 | 230773 | 307696 | 76924
16408 | public | t_character_types | 5/13 | 307697 | 384620 | 76924
16408 | public | t_character_types | 6/13 | 384621 | 461544 | 76924
16408 | public | t_character_types | 7/13 | 461545 | 538468 | 76924
16408 | public | t_character_types | 8/13 | 538469 | 615392 | 76924
16408 | public | t_character_types | 9/13 | 615393 | 692316 | 76924
16408 | public | t_character_types | 10/13 | 692317 | 769240 | 76924
16408 | public | t_character_types | 11/13 | 769241 | 846164 | 76924
16408 | public | t_character_types | 12/13 | 846165 | 923088 | 76924
16408 | public | t_character_types | 13/13 | 923089 | 1000001 | 76913
23:00:43.998 27191 INFO Table public.t_json_types COPY will be split 11-ways
OID | Schema Name | Table Name | Part | Min | Max | Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
16457 | public | t_json_types | 1/11 | 1 | 90910 | 90910
16457 | public | t_json_types | 2/11 | 90911 | 181820 | 90910
16457 | public | t_json_types | 3/11 | 181821 | 272730 | 90910
16457 | public | t_json_types | 4/11 | 272731 | 363640 | 90910
16457 | public | t_json_types | 5/11 | 363641 | 454550 | 90910
16457 | public | t_json_types | 6/11 | 454551 | 545460 | 90910
16457 | public | t_json_types | 7/11 | 545461 | 636370 | 90910
16457 | public | t_json_types | 8/11 | 636371 | 727280 | 90910
16457 | public | t_json_types | 9/11 | 727281 | 818190 | 90910
16457 | public | t_json_types | 10/11 | 818191 | 909100 | 90910
16457 | public | t_json_types | 11/11 | 909101 | 1000001 | 90901
23:00:43.998 27191 INFO Table public.t_bit_types COPY will be split 10-ways
OID | Schema Name | Table Name | Part | Min | Max | Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
16449 | public | t_bit_types | 1/10 | 1 | 200001 | 200001
16449 | public | t_bit_types | 2/10 | 200002 | 400002 | 200001
16449 | public | t_bit_types | 3/10 | 400003 | 600003 | 200001
16449 | public | t_bit_types | 4/10 | 600004 | 800004 | 200001
16449 | public | t_bit_types | 5/10 | 800005 | 1000005 | 200001
16449 | public | t_bit_types | 6/10 | 1000006 | 1200006 | 200001
16449 | public | t_bit_types | 7/10 | 1200007 | 1400007 | 200001
16449 | public | t_bit_types | 8/10 | 1400008 | 1600008 | 200001
16449 | public | t_bit_types | 9/10 | 1600009 | 1800009 | 200001
16449 | public | t_bit_types | 10/10 | 1800010 | 2000002 | 199993
23:00:43.998 27191 INFO Table public.t_complex_types COPY will be split 9-ways
OID | Schema Name | Table Name | Part | Min | Max | Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
16482 | public | t_complex_types | 1/9 | 1 | 111112 | 111112
16482 | public | t_complex_types | 2/9 | 111113 | 222224 | 111112
16482 | public | t_complex_types | 3/9 | 222225 | 333336 | 111112
16482 | public | t_complex_types | 4/9 | 333337 | 444448 | 111112
16482 | public | t_complex_types | 5/9 | 444449 | 555560 | 111112
16482 | public | t_complex_types | 6/9 | 555561 | 666672 | 111112
16482 | public | t_complex_types | 7/9 | 666673 | 777784 | 111112
16482 | public | t_complex_types | 8/9 | 777785 | 888896 | 111112
16482 | public | t_complex_types | 9/9 | 888897 | 1000001 | 111105
23:00:43.998 27191 INFO Table public.t_numeric_types COPY will be split 8-ways
OID | Schema Name | Table Name | Part | Min | Max | Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
16386 | public | t_numeric_types | 1/8 | 1 | 125001 | 125001
16386 | public | t_numeric_types | 2/8 | 125002 | 250002 | 125001
16386 | public | t_numeric_types | 3/8 | 250003 | 375003 | 125001
16386 | public | t_numeric_types | 4/8 | 375004 | 500004 | 125001
16386 | public | t_numeric_types | 5/8 | 500005 | 625005 | 125001
16386 | public | t_numeric_types | 6/8 | 625006 | 750006 | 125001
16386 | public | t_numeric_types | 7/8 | 750007 | 875007 | 125001
16386 | public | t_numeric_types | 8/8 | 875008 | 1000002 | 124995
23:00:43.998 27191 INFO Table public.t_time_types COPY will be split 8-ways
OID | Schema Name | Table Name | Part | Min | Max | Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
16419 | public | t_time_types | 1/8 | 1 | 125001 | 125001
16419 | public | t_time_types | 2/8 | 125002 | 250002 | 125001
16419 | public | t_time_types | 3/8 | 250003 | 375003 | 125001
16419 | public | t_time_types | 4/8 | 375004 | 500004 | 125001
16419 | public | t_time_types | 5/8 | 500005 | 625005 | 125001
16419 | public | t_time_types | 6/8 | 625006 | 750006 | 125001
16419 | public | t_time_types | 7/8 | 750007 | 875007 | 125001
16419 | public | t_time_types | 8/8 | 875008 | 1000001 | 124994
23:00:43.999 27191 INFO Table public.nopk_t_character_types COPY will be split 8-ways
OID | Schema Name | Table Name | Part | Min | Max | Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
16543 | public | nopk_t_character_types | 1/8 | (0,0) | (1166,0) | 1167
16543 | public | nopk_t_character_types | 2/8 | (1167,0) | (2333,0) | 1167
16543 | public | nopk_t_character_types | 3/8 | (2334,0) | (3500,0) | 1167
16543 | public | nopk_t_character_types | 4/8 | (3501,0) | (4667,0) | 1167
16543 | public | nopk_t_character_types | 5/8 | (4668,0) | (5834,0) | 1167
16543 | public | nopk_t_character_types | 6/8 | (5835,0) | (7001,0) | 1167
16543 | public | nopk_t_character_types | 7/8 | (7002,0) | (8164,0) | 1163
16543 | public | nopk_t_character_types | 8/8 | (8164,0) | - | -
23:00:43.999 27191 INFO Table public.t_network_types COPY will be split 6-ways
OID | Schema Name | Table Name | Part | Min | Max | Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
16438 | public | t_network_types | 1/6 | 1 | 166668 | 166668
16438 | public | t_network_types | 2/6 | 166669 | 333336 | 166668
16438 | public | t_network_types | 3/6 | 333337 | 500004 | 166668
16438 | public | t_network_types | 4/6 | 500005 | 666672 | 166668
16438 | public | t_network_types | 5/6 | 666673 | 833340 | 166668
16438 | public | t_network_types | 6/6 | 833341 | 1000001 | 166661
23:00:43.999 27191 INFO Table public.t_domain_types COPY will be split 6-ways
OID | Schema Name | Table Name | Part | Min | Max | Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
16506 | public | t_domain_types | 1/6 | 1 | 166668 | 166668
16506 | public | t_domain_types | 2/6 | 166669 | 333336 | 166668
16506 | public | t_domain_types | 3/6 | 333337 | 500004 | 166668
16506 | public | t_domain_types | 4/6 | 500005 | 666672 | 166668
16506 | public | t_domain_types | 5/6 | 666673 | 833340 | 166668
16506 | public | t_domain_types | 6/6 | 833341 | 1000001 | 166661
23:00:43.999 27191 INFO Table public.files COPY will be split 5-ways
OID | Schema Name | Table Name | Part | Min | Max | Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
16518 | public | files | 1/5 | 1 | 200001 | 200001
16518 | public | files | 2/5 | 200002 | 400002 | 200001
16518 | public | files | 3/5 | 400003 | 600003 | 200001
16518 | public | files | 4/5 | 600004 | 800004 | 200001
16518 | public | files | 5/5 | 800005 | 1000001 | 199997
23:00:43.999 27191 INFO Table public.t_binary_types COPY will be split 5-ways
OID | Schema Name | Table Name | Part | Min | Max | Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
16397 | public | t_binary_types | 1/5 | 1 | 200001 | 200001
16397 | public | t_binary_types | 2/5 | 200002 | 400002 | 200001
16397 | public | t_binary_types | 3/5 | 400003 | 600003 | 200001
16397 | public | t_binary_types | 4/5 | 600004 | 800004 | 200001
16397 | public | t_binary_types | 5/5 | 800005 | 1000001 | 199997
It's been a while since I worked on this. I'm not sure if I will be working on this soon and whether this is relevant anymore. So closing this now. Thanks for the review, Dimitri.