pgcopydb icon indicating copy to clipboard operation
pgcopydb copied to clipboard

Improve `list table-parts` to list parts for all tables.

Open shubhamdhama opened this issue 2 years ago • 5 comments

[edit] See this commit for fresh description: https://github.com/dimitri/pgcopydb/pull/611/commits/4df109ed5119b06a35a0e2305aeecfb8c8a18861. Keeping below text for completeness of discussion.


  • Replace catalog_init_from_specs with copydb_fetch_schema_and_prepare_specs. The former works only if the catalog already exists. The latter connects to the source and provides a fresh report.
  • Correct confusing output where Max of last row was (-1,0). Revert to old behavior for clarity.
  • Limit copydb_fetch_schema_and_prepare_specs to fetch information only for the specified table and schema. Initialize the filter of specs with the passed single table and schema.

shubhamdhama avatar Jan 03 '24 07:01 shubhamdhama

The pgcopydb list ... commands are now in a different place as they were before the SQLite catalogs implementation. Because the catalogs are kept on-disk as a cache that we can re-use from a command to the next, it is now possible to fetch the catalog information first and re-use it multiple times, as exercised by make tests/pagila-multi-steps.

As a result, we register the catalogs setup (source and target URIs, snapshot, filtering, etc) and compare it with the command's context to make sure that we can re-use the on-disk cache when running the next command.

For the pgcopydb list table-parts command, historically it would only work one table at a time, as a way of just debugging pgcopydb split approach. Also it would be best if the command was not preparing the whole of the catalogs fetching thousands of table metadata just to be able to display partitions for one of them. That's why at the moment the pgcopydb list table-parts command requires another catalog listing such as pgcopydb list tables or pgcopydb list schema to have been invoked previously.

Passing down the one-table filtering to copydb_fetch_schema_and_prepare_specs could register that filtering as the catalog setup, and then other commands would refuse to re-use the cache because it was computed for a different filtering setup. That's not good.

We can change command's design, and I have been wondering if the pgcopydb list table-parts command should be enhanced to work on all the tables that we partition at once. It should be easy to implement now that we have the catalogs available and pre-computed in a SQLite file.

dimitri avatar Jan 03 '24 09:01 dimitri

Thanks @dimitri for the review -- it totally make sense! I'm working on making this change.

Regarding the catalog initialization using copydb_fetch_schema_and_prepare_specs, it's worth noting that we also update the s_table_parts catalog. Conseqently now --split-tables-larger-than should be passed to list tables. And list table-parts would merely emit the splits calculated previously. If one attempts new values of --split-tables-larger-than with list table-parts would conflict and fail. In the scenario where a user wants to tune the value of --split-tables-larger-than to observe it's impact, they have to re-run list tables followed by list table-parts. I don't see a problem with this, but I just want to make sure it's intended.

Alternatively we can allow change to --split-tables-larger-than with some --force option in list table-parts and it would recompute the parts and update the catalog.

If we chose to go with update-catalog-only-once case (the former one), I think we should also store ctid as the part-key in s_table catalog wherever applicable so that if we use catalog_lookup_s_table_by_name like we are doing in cli_list_table_parts, we have the correct information about the part-key. This lookup of part-key is used to format output and anyways it should be there for correctness.

What do you think?

My intention is to introduce --disable-split-by-ctid/--disable-ctid-as-partkey after all these changes.

shubhamdhama avatar Jan 04 '24 09:01 shubhamdhama

Regarding the catalog initialization using copydb_fetch_schema_and_prepare_specs, it's worth noting that we also update the s_table_parts catalog. Conseqently now --split-tables-larger-than should be passed to list tables.

Yes.

And list table-parts would merely emit the splits calculated previously. If one attempts new values of --split-tables-larger-than with list table-parts would conflict and fail.

Maybe we can even discard the --split-tables-larger-than option in pgcopydb list table-parts and only re-use whatever was given to pgcopydb list tables to make it clear that's how it works?

In the scenario where a user wants to tune the value of --split-tables-larger-than to observe it's impact, they have to re-run list tables followed by list table-parts. I don't see a problem with this, but I just want to make sure it's intended.

Agreed. That's intended. That's how it works at the moment.

Alternatively we can allow change to --split-tables-larger-than with some --force option in list table-parts and it would recompute the parts and update the catalog.

That makes sense, that's a good use of the --force option indeed.

If we chose to go with update-catalog-only-once case (the former one), I think we should also store ctid as the part-key in s_table catalog wherever applicable so that if we use catalog_lookup_s_table_by_name like we are doing in cli_list_table_parts, we have the correct information about the part-key. This lookup of part-key is used to format output and anyways it should be there for correctness.

What do you think?

I was on the fence about updating the catalogs with the ctid partkey information, but I can see how that makes sense for the rest of the code.

My intention is to introduce --disable-split-by-ctid/--disable-ctid-as-partkey after all these changes.

I think I prefer the first spelling of it: --disable-split-by-ctid.

dimitri avatar Jan 04 '24 10:01 dimitri

@dimitri I have updated the PR with commit message hopefully explaining what I have done so far. It address most of what we discussed above, here are the immediate follow-ups

  • [ ] Throw error if list table-parts is run without running list schema or list tables with appropriate error.
  • [ ] Fix --force option. Currently I have updated the doc with the intended behavior as per above discussion.

Independent follow-ups

  • [ ] Add --disable-split-by-ctid
  • [ ] Find smarter strategy to use ctid as part-key for only some versions of source server.

shubhamdhama avatar Jan 05 '24 17:01 shubhamdhama

Here is how output look now,

pgcopydb list table-parts --split-tables-larger-than '10 MB'
Output
23:00:43.996 27191 INFO   Table public.t_geometric_types COPY will be split 33-ways
     OID |               Schema Name |                Table Name |         Part |          Min |          Max |        Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
   16427 |                    public |         t_geometric_types |         1/33 |            1 |        30304 |        30304
   16427 |                    public |         t_geometric_types |         2/33 |        30305 |        60608 |        30304
   16427 |                    public |         t_geometric_types |         3/33 |        60609 |        90912 |        30304
   16427 |                    public |         t_geometric_types |         4/33 |        90913 |       121216 |        30304
   16427 |                    public |         t_geometric_types |         5/33 |       121217 |       151520 |        30304
   16427 |                    public |         t_geometric_types |         6/33 |       151521 |       181824 |        30304
   16427 |                    public |         t_geometric_types |         7/33 |       181825 |       212128 |        30304
   16427 |                    public |         t_geometric_types |         8/33 |       212129 |       242432 |        30304
   16427 |                    public |         t_geometric_types |         9/33 |       242433 |       272736 |        30304
   16427 |                    public |         t_geometric_types |        10/33 |       272737 |       303040 |        30304
   16427 |                    public |         t_geometric_types |        11/33 |       303041 |       333344 |        30304
   16427 |                    public |         t_geometric_types |        12/33 |       333345 |       363648 |        30304
   16427 |                    public |         t_geometric_types |        13/33 |       363649 |       393952 |        30304
   16427 |                    public |         t_geometric_types |        14/33 |       393953 |       424256 |        30304
   16427 |                    public |         t_geometric_types |        15/33 |       424257 |       454560 |        30304
   16427 |                    public |         t_geometric_types |        16/33 |       454561 |       484864 |        30304
   16427 |                    public |         t_geometric_types |        17/33 |       484865 |       515168 |        30304
   16427 |                    public |         t_geometric_types |        18/33 |       515169 |       545472 |        30304
   16427 |                    public |         t_geometric_types |        19/33 |       545473 |       575776 |        30304
   16427 |                    public |         t_geometric_types |        20/33 |       575777 |       606080 |        30304
   16427 |                    public |         t_geometric_types |        21/33 |       606081 |       636384 |        30304
   16427 |                    public |         t_geometric_types |        22/33 |       636385 |       666688 |        30304
   16427 |                    public |         t_geometric_types |        23/33 |       666689 |       696992 |        30304
   16427 |                    public |         t_geometric_types |        24/33 |       696993 |       727296 |        30304
   16427 |                    public |         t_geometric_types |        25/33 |       727297 |       757600 |        30304
   16427 |                    public |         t_geometric_types |        26/33 |       757601 |       787904 |        30304
   16427 |                    public |         t_geometric_types |        27/33 |       787905 |       818208 |        30304
   16427 |                    public |         t_geometric_types |        28/33 |       818209 |       848512 |        30304
   16427 |                    public |         t_geometric_types |        29/33 |       848513 |       878816 |        30304
   16427 |                    public |         t_geometric_types |        30/33 |       878817 |       909120 |        30304
   16427 |                    public |         t_geometric_types |        31/33 |       909121 |       939424 |        30304
   16427 |                    public |         t_geometric_types |        32/33 |       939425 |       969728 |        30304
   16427 |                    public |         t_geometric_types |        33/33 |       969729 |      1000001 |        30273

23:00:43.997 27191 INFO   Table public.t_range_types COPY will be split 15-ways
     OID |               Schema Name |                Table Name |         Part |          Min |          Max |        Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
   16493 |                    public |             t_range_types |         1/15 |            1 |        66668 |        66668
   16493 |                    public |             t_range_types |         2/15 |        66669 |       133336 |        66668
   16493 |                    public |             t_range_types |         3/15 |       133337 |       200004 |        66668
   16493 |                    public |             t_range_types |         4/15 |       200005 |       266672 |        66668
   16493 |                    public |             t_range_types |         5/15 |       266673 |       333340 |        66668
   16493 |                    public |             t_range_types |         6/15 |       333341 |       400008 |        66668
   16493 |                    public |             t_range_types |         7/15 |       400009 |       466676 |        66668
   16493 |                    public |             t_range_types |         8/15 |       466677 |       533344 |        66668
   16493 |                    public |             t_range_types |         9/15 |       533345 |       600012 |        66668
   16493 |                    public |             t_range_types |        10/15 |       600013 |       666680 |        66668
   16493 |                    public |             t_range_types |        11/15 |       666681 |       733348 |        66668
   16493 |                    public |             t_range_types |        12/15 |       733349 |       800016 |        66668
   16493 |                    public |             t_range_types |        13/15 |       800017 |       866684 |        66668
   16493 |                    public |             t_range_types |        14/15 |       866685 |       933352 |        66668
   16493 |                    public |             t_range_types |        15/15 |       933353 |      1000001 |        66649

23:00:43.997 27191 INFO   Table public.t_array_types COPY will be split 14-ways
     OID |               Schema Name |                Table Name |         Part |          Min |          Max |        Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
   16468 |                    public |             t_array_types |         1/14 |            1 |        71430 |        71430
   16468 |                    public |             t_array_types |         2/14 |        71431 |       142860 |        71430
   16468 |                    public |             t_array_types |         3/14 |       142861 |       214290 |        71430
   16468 |                    public |             t_array_types |         4/14 |       214291 |       285720 |        71430
   16468 |                    public |             t_array_types |         5/14 |       285721 |       357150 |        71430
   16468 |                    public |             t_array_types |         6/14 |       357151 |       428580 |        71430
   16468 |                    public |             t_array_types |         7/14 |       428581 |       500010 |        71430
   16468 |                    public |             t_array_types |         8/14 |       500011 |       571440 |        71430
   16468 |                    public |             t_array_types |         9/14 |       571441 |       642870 |        71430
   16468 |                    public |             t_array_types |        10/14 |       642871 |       714300 |        71430
   16468 |                    public |             t_array_types |        11/14 |       714301 |       785730 |        71430
   16468 |                    public |             t_array_types |        12/14 |       785731 |       857160 |        71430
   16468 |                    public |             t_array_types |        13/14 |       857161 |       928590 |        71430
   16468 |                    public |             t_array_types |        14/14 |       928591 |      1000001 |        71411

23:00:43.998 27191 INFO   Table public.t_character_types COPY will be split 13-ways
     OID |               Schema Name |                Table Name |         Part |          Min |          Max |        Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
   16408 |                    public |         t_character_types |         1/13 |            1 |        76924 |        76924
   16408 |                    public |         t_character_types |         2/13 |        76925 |       153848 |        76924
   16408 |                    public |         t_character_types |         3/13 |       153849 |       230772 |        76924
   16408 |                    public |         t_character_types |         4/13 |       230773 |       307696 |        76924
   16408 |                    public |         t_character_types |         5/13 |       307697 |       384620 |        76924
   16408 |                    public |         t_character_types |         6/13 |       384621 |       461544 |        76924
   16408 |                    public |         t_character_types |         7/13 |       461545 |       538468 |        76924
   16408 |                    public |         t_character_types |         8/13 |       538469 |       615392 |        76924
   16408 |                    public |         t_character_types |         9/13 |       615393 |       692316 |        76924
   16408 |                    public |         t_character_types |        10/13 |       692317 |       769240 |        76924
   16408 |                    public |         t_character_types |        11/13 |       769241 |       846164 |        76924
   16408 |                    public |         t_character_types |        12/13 |       846165 |       923088 |        76924
   16408 |                    public |         t_character_types |        13/13 |       923089 |      1000001 |        76913

23:00:43.998 27191 INFO   Table public.t_json_types COPY will be split 11-ways
     OID |               Schema Name |                Table Name |         Part |          Min |          Max |        Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
   16457 |                    public |              t_json_types |         1/11 |            1 |        90910 |        90910
   16457 |                    public |              t_json_types |         2/11 |        90911 |       181820 |        90910
   16457 |                    public |              t_json_types |         3/11 |       181821 |       272730 |        90910
   16457 |                    public |              t_json_types |         4/11 |       272731 |       363640 |        90910
   16457 |                    public |              t_json_types |         5/11 |       363641 |       454550 |        90910
   16457 |                    public |              t_json_types |         6/11 |       454551 |       545460 |        90910
   16457 |                    public |              t_json_types |         7/11 |       545461 |       636370 |        90910
   16457 |                    public |              t_json_types |         8/11 |       636371 |       727280 |        90910
   16457 |                    public |              t_json_types |         9/11 |       727281 |       818190 |        90910
   16457 |                    public |              t_json_types |        10/11 |       818191 |       909100 |        90910
   16457 |                    public |              t_json_types |        11/11 |       909101 |      1000001 |        90901

23:00:43.998 27191 INFO   Table public.t_bit_types COPY will be split 10-ways
     OID |               Schema Name |                Table Name |         Part |          Min |          Max |        Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
   16449 |                    public |               t_bit_types |         1/10 |            1 |       200001 |       200001
   16449 |                    public |               t_bit_types |         2/10 |       200002 |       400002 |       200001
   16449 |                    public |               t_bit_types |         3/10 |       400003 |       600003 |       200001
   16449 |                    public |               t_bit_types |         4/10 |       600004 |       800004 |       200001
   16449 |                    public |               t_bit_types |         5/10 |       800005 |      1000005 |       200001
   16449 |                    public |               t_bit_types |         6/10 |      1000006 |      1200006 |       200001
   16449 |                    public |               t_bit_types |         7/10 |      1200007 |      1400007 |       200001
   16449 |                    public |               t_bit_types |         8/10 |      1400008 |      1600008 |       200001
   16449 |                    public |               t_bit_types |         9/10 |      1600009 |      1800009 |       200001
   16449 |                    public |               t_bit_types |        10/10 |      1800010 |      2000002 |       199993

23:00:43.998 27191 INFO   Table public.t_complex_types COPY will be split 9-ways
     OID |               Schema Name |                Table Name |         Part |          Min |          Max |        Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
   16482 |                    public |           t_complex_types |          1/9 |            1 |       111112 |       111112
   16482 |                    public |           t_complex_types |          2/9 |       111113 |       222224 |       111112
   16482 |                    public |           t_complex_types |          3/9 |       222225 |       333336 |       111112
   16482 |                    public |           t_complex_types |          4/9 |       333337 |       444448 |       111112
   16482 |                    public |           t_complex_types |          5/9 |       444449 |       555560 |       111112
   16482 |                    public |           t_complex_types |          6/9 |       555561 |       666672 |       111112
   16482 |                    public |           t_complex_types |          7/9 |       666673 |       777784 |       111112
   16482 |                    public |           t_complex_types |          8/9 |       777785 |       888896 |       111112
   16482 |                    public |           t_complex_types |          9/9 |       888897 |      1000001 |       111105

23:00:43.998 27191 INFO   Table public.t_numeric_types COPY will be split 8-ways
     OID |               Schema Name |                Table Name |         Part |          Min |          Max |        Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
   16386 |                    public |           t_numeric_types |          1/8 |            1 |       125001 |       125001
   16386 |                    public |           t_numeric_types |          2/8 |       125002 |       250002 |       125001
   16386 |                    public |           t_numeric_types |          3/8 |       250003 |       375003 |       125001
   16386 |                    public |           t_numeric_types |          4/8 |       375004 |       500004 |       125001
   16386 |                    public |           t_numeric_types |          5/8 |       500005 |       625005 |       125001
   16386 |                    public |           t_numeric_types |          6/8 |       625006 |       750006 |       125001
   16386 |                    public |           t_numeric_types |          7/8 |       750007 |       875007 |       125001
   16386 |                    public |           t_numeric_types |          8/8 |       875008 |      1000002 |       124995

23:00:43.998 27191 INFO   Table public.t_time_types COPY will be split 8-ways
     OID |               Schema Name |                Table Name |         Part |          Min |          Max |        Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
   16419 |                    public |              t_time_types |          1/8 |            1 |       125001 |       125001
   16419 |                    public |              t_time_types |          2/8 |       125002 |       250002 |       125001
   16419 |                    public |              t_time_types |          3/8 |       250003 |       375003 |       125001
   16419 |                    public |              t_time_types |          4/8 |       375004 |       500004 |       125001
   16419 |                    public |              t_time_types |          5/8 |       500005 |       625005 |       125001
   16419 |                    public |              t_time_types |          6/8 |       625006 |       750006 |       125001
   16419 |                    public |              t_time_types |          7/8 |       750007 |       875007 |       125001
   16419 |                    public |              t_time_types |          8/8 |       875008 |      1000001 |       124994

23:00:43.999 27191 INFO   Table public.nopk_t_character_types COPY will be split 8-ways
     OID |               Schema Name |                Table Name |         Part |          Min |          Max |        Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
   16543 |                    public |    nopk_t_character_types |          1/8 |        (0,0) |     (1166,0) |         1167
   16543 |                    public |    nopk_t_character_types |          2/8 |     (1167,0) |     (2333,0) |         1167
   16543 |                    public |    nopk_t_character_types |          3/8 |     (2334,0) |     (3500,0) |         1167
   16543 |                    public |    nopk_t_character_types |          4/8 |     (3501,0) |     (4667,0) |         1167
   16543 |                    public |    nopk_t_character_types |          5/8 |     (4668,0) |     (5834,0) |         1167
   16543 |                    public |    nopk_t_character_types |          6/8 |     (5835,0) |     (7001,0) |         1167
   16543 |                    public |    nopk_t_character_types |          7/8 |     (7002,0) |     (8164,0) |         1163
   16543 |                    public |    nopk_t_character_types |          8/8 |     (8164,0) |            - |            -

23:00:43.999 27191 INFO   Table public.t_network_types COPY will be split 6-ways
     OID |               Schema Name |                Table Name |         Part |          Min |          Max |        Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
   16438 |                    public |           t_network_types |          1/6 |            1 |       166668 |       166668
   16438 |                    public |           t_network_types |          2/6 |       166669 |       333336 |       166668
   16438 |                    public |           t_network_types |          3/6 |       333337 |       500004 |       166668
   16438 |                    public |           t_network_types |          4/6 |       500005 |       666672 |       166668
   16438 |                    public |           t_network_types |          5/6 |       666673 |       833340 |       166668
   16438 |                    public |           t_network_types |          6/6 |       833341 |      1000001 |       166661

23:00:43.999 27191 INFO   Table public.t_domain_types COPY will be split 6-ways
     OID |               Schema Name |                Table Name |         Part |          Min |          Max |        Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
   16506 |                    public |            t_domain_types |          1/6 |            1 |       166668 |       166668
   16506 |                    public |            t_domain_types |          2/6 |       166669 |       333336 |       166668
   16506 |                    public |            t_domain_types |          3/6 |       333337 |       500004 |       166668
   16506 |                    public |            t_domain_types |          4/6 |       500005 |       666672 |       166668
   16506 |                    public |            t_domain_types |          5/6 |       666673 |       833340 |       166668
   16506 |                    public |            t_domain_types |          6/6 |       833341 |      1000001 |       166661

23:00:43.999 27191 INFO   Table public.files COPY will be split 5-ways
     OID |               Schema Name |                Table Name |         Part |          Min |          Max |        Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
   16518 |                    public |                     files |          1/5 |            1 |       200001 |       200001
   16518 |                    public |                     files |          2/5 |       200002 |       400002 |       200001
   16518 |                    public |                     files |          3/5 |       400003 |       600003 |       200001
   16518 |                    public |                     files |          4/5 |       600004 |       800004 |       200001
   16518 |                    public |                     files |          5/5 |       800005 |      1000001 |       199997

23:00:43.999 27191 INFO   Table public.t_binary_types COPY will be split 5-ways
     OID |               Schema Name |                Table Name |         Part |          Min |          Max |        Count
---------+---------------------------+---------------------------+--------------+--------------+--------------+-------------
   16397 |                    public |            t_binary_types |          1/5 |            1 |       200001 |       200001
   16397 |                    public |            t_binary_types |          2/5 |       200002 |       400002 |       200001
   16397 |                    public |            t_binary_types |          3/5 |       400003 |       600003 |       200001
   16397 |                    public |            t_binary_types |          4/5 |       600004 |       800004 |       200001
   16397 |                    public |            t_binary_types |          5/5 |       800005 |      1000001 |       199997

shubhamdhama avatar Jan 05 '24 17:01 shubhamdhama

It's been a while since I worked on this. I'm not sure if I will be working on this soon and whether this is relevant anymore. So closing this now. Thanks for the review, Dimitri.

shubhamdhama avatar Jun 24 '24 06:06 shubhamdhama