gpdb COPY/CLUSTER: Use ntuples for gp

COPY/CLUSTER: Use ntuples for gp_fastsequence allocation

Open soumyadeep2007 opened this issue 3 years ago • 2 comments

For multi-insert, our insert workflow already supports batched WAL and there is no page locking by virtue of buffer cache bypass for AO/CO relations. So there is not much to gain over the existing call-single-insert-in-a-loop approach that we have. There is except for optimizing gp_fastsequence in-place updates. This is described below:

For COPY, we know the number of tuples to be inserted for every invocation of table_multi_insert(). Due to the sizing of the copy buffer, we can call table_multi_insert() with #rows = 1000 in the best case => which implies we can cut down on the number of in-place updates by a factor of 10 in the best case(as we don't have to do 10 piecemeal allocations of NUM_FAST_SEQUENCES each). This can give us a performance boost, specially when we have concurrent queries on the same table (as GetFastSequences() grabs a RowExclusive lock).

We can do the same thing for CLUSTER, where we know how many tuples are involved in the invoked insert.

We can't do this for piecemeal INSERTs and VACUUM because we don't know the number of tuples and the number of visible tuples respectively.

PS: We extend the use of debug_appendonly_print_insert_tuple to give us insight into gp_fastsequence manipulation.

Aug 11 '22 23:08 soumyadeep2007

LGTM.

Aug 15 '22 02:08 haolinw

Test failures fixed. Current test failure: unrelated flake.

Aug 20 '22 00:08 soumyadeep2007

gpdb gpdb copied to clipboard

COPY/CLUSTER: Use ntuples for gp_fastsequence allocation

gpdb
gpdb copied to clipboard