timescaledb
timescaledb copied to clipboard
Batch rows on access node for distributed COPY
Group the incoming rows into batches on access node before COPYing to data nodes.
Also switch the data node connections to nonblocking mode for sending COPY data, so that we can work with many data nodes concurrently.
This gives 2x-5x speedup on various COPY queries to distributed hypertables.
Part of #4285
Codecov Report
Merging #4476 (d85d6d3) into main (33e4e55) will decrease coverage by
0.10%
. The diff coverage is87.78%
.
@@ Coverage Diff @@
## main #4476 +/- ##
==========================================
- Coverage 90.99% 90.88% -0.11%
==========================================
Files 224 224
Lines 42586 42785 +199
==========================================
+ Hits 38751 38887 +136
- Misses 3835 3898 +63
Impacted Files | Coverage Δ | |
---|---|---|
tsl/src/remote/connection.c | 88.29% <73.68%> (-0.21%) |
:arrow_down: |
tsl/src/remote/dist_copy.c | 87.85% <88.53%> (-5.53%) |
:arrow_down: |
src/planner/constify_now.c | 97.93% <90.90%> (-0.99%) |
:arrow_down: |
src/guc.c | 100.00% <100.00%> (ø) |
|
tsl/src/nodes/data_node_copy.c | 94.90% <100.00%> (ø) |
|
src/bgw/scheduler.c | 85.71% <0.00%> (-2.92%) |
:arrow_down: |
src/loader/bgw_message_queue.c | 85.52% <0.00%> (-2.64%) |
:arrow_down: |
tsl/src/reorder.c | 85.37% <0.00%> (-0.27%) |
:arrow_down: |
src/bgw/job.c | 93.57% <0.00%> (-0.20%) |
:arrow_down: |
... and 1 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 7600896...d85d6d3. Read the comment docs.
Looking through the remote_copy, I see that we use a replication factor of 2 (good), but it might be a good idea to test with a few other replication factors as well (in particular 1, but that might already be tested elsewhere) and also testing a few more corner-cases such as copying empty rows (to a table with no columns) and copying no rows at all (possibly a few more as well).
I added some tests with different replication factors and numbers of rows to dist_copy_long
. Not sure how to test it w/o columns, is it possible to create a hypertable w/o a time column?
Turns out the text COPY passthrough is just totally broken :weary:
https://github.com/timescale/timescaledb/issues/4761