timescaledb
timescaledb copied to clipboard
testing some stuff around distributed inserts
Changes
- Use nonblocking mode on COPY connections.
- Group the rows into batches on access node before sending them to data nodes.
Benchmarks
I'm using the UK Price Paid data set because that's what I had at hand: https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads
I'm benchmarking inserts of 10k, 100k and 1M rows, ordered either randomly or by time
& postcode2
which is used for space partitioning. The '-1' and '-2' versions of files are just different random subsets of data, to test inserting into new chunks or existing chunks. I'm tesing distributed hypertable ordered by time
, either with no space partitioning or partitioned by postcode2
:
# \d uk_price_paid
Table "public.uk_price_paid"
Column │ Type │ Collation │ Nullable │ Default
───────────┼──────────┼───────────┼──────────┼─────────
price │ integer │ │ │
date │ date │ │ not null │
postcode1 │ text │ │ │
postcode2 │ text │ │ │
type │ smallint │ │ │
is_new │ boolean │ │ │
duration │ smallint │ │ │
addr1 │ text │ │ │
addr2 │ text │ │ │
street │ text │ │ │
locality │ text │ │ │
town │ text │ │ │
district │ text │ │ │
country │ text │ │ │
category │ smallint │ │ │
Indexes:
"uk_price_paid_date_idx" btree (date DESC)
"uk_price_paid_postcode2_date_idx" btree (postcode2, date DESC)
Triggers:
ts_insert_blocker BEFORE INSERT ON uk_price_paid FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
# select create_hypertable('uk_price_paid_l', 'date');
# select create_distributed_hypertable('uk_price_paid_ns', 'date');
# select create_distributed_hypertable('uk_price_paid', 'date', 'postcode2');
I'm using the following, pretty generic, queries:
==> insert-binary.sql <==
begin;
\copy rides1 from '/home/akuzm/nyc_data_rides.pgbinary' with (format binary);
rollback;
==> insert-prices-100k-ordered.sql <==
begin;
copy uk_price_paid from '/home/akuzm/uk-price-paid/prices-100k-ordered-1.tsv';
rollback;
==> insert-prices-100k-random.sql <==
begin;
copy uk_price_paid from '/home/akuzm/uk-price-paid/prices-100k-random-1.tsv';
rollback;
==> insert-prices-10k-ordered-2.sql <==
begin;
copy uk_price_paid from '/home/akuzm/uk-price-paid/prices-10k-ordered-2.tsv';
rollback;
==> insert-prices-10k-ordered.sql <==
begin;
copy uk_price_paid from '/home/akuzm/uk-price-paid/prices-10k-ordered-1.tsv';
rollback;
==> insert-prices-10k-random.sql <==
begin;
copy uk_price_paid from '/home/akuzm/uk-price-paid/prices-10k-random-1.tsv';
rollback;
==> insert-prices-1M-ordered.sql <==
begin;
copy uk_price_paid from '/home/akuzm/uk-price-paid/prices-1M-ordered-1.tsv';
rollback;
==> insert-prices-l-10k-ordered-2.sql <==
begin;
copy uk_price_paid_l from '/home/akuzm/uk-price-paid/prices-10k-ordered-2.tsv';
rollback;
==> insert-prices-l-10k-ordered.sql <==
begin;
copy uk_price_paid_l from '/home/akuzm/uk-price-paid/prices-10k-ordered-1.tsv';
rollback;
==> insert-prices-l-10k-random.sql <==
begin;
copy uk_price_paid_l from '/home/akuzm/uk-price-paid/prices-10k-random-1.tsv';
rollback;
==> insert-prices-ns-10k-ordered-2.sql <==
begin;
copy uk_price_paid_ns from '/home/akuzm/uk-price-paid/prices-10k-ordered-2.tsv';
rollback;
==> insert-prices-ns-10k-ordered.sql <==
begin;
copy uk_price_paid_ns from '/home/akuzm/uk-price-paid/prices-10k-ordered-1.tsv';
rollback;
==> insert-prices-ns-10k-random.sql <==
begin;
copy uk_price_paid_ns from '/home/akuzm/uk-price-paid/prices-10k-random-1.tsv';
rollback;
==> insert.sql <==
begin;
\copy rides1 from '/home/akuzm/nyc_data_rides.tsv';
rollback;
Parts of this PR:
- https://github.com/timescale/timescaledb/pull/4205
- https://github.com/timescale/timescaledb/pull/4281
- https://github.com/timescale/timescaledb/pull/4390
- https://github.com/timescale/timescaledb/pull/4476
- https://github.com/timescale/timescaledb/pull/4478
- https://github.com/timescale/timescaledb/pull/4483
- https://github.com/timescale/timescaledb/pull/4484
- https://github.com/timescale/timescaledb/pull/4488
Codecov Report
Merging #4285 (435c31c) into main (8bf6c88) will increase coverage by
0.01%
. The diff coverage is85.33%
.
:exclamation: Current head 435c31c differs from pull request most recent head b1e2687. Consider uploading reports for the commit b1e2687 to get more accurate results
@@ Coverage Diff @@
## main #4285 +/- ##
==========================================
+ Coverage 90.79% 90.81% +0.01%
==========================================
Files 221 221
Lines 41006 40937 -69
==========================================
- Hits 37233 37178 -55
+ Misses 3773 3759 -14
Impacted Files | Coverage Δ | |
---|---|---|
tsl/src/remote/connection.c | 88.37% <54.16%> (-0.95%) |
:arrow_down: |
tsl/src/remote/dist_copy.c | 90.99% <88.94%> (-2.39%) |
:arrow_down: |
src/copy.c | 94.49% <100.00%> (+0.02%) |
:arrow_up: |
src/bgw/scheduler.c | 82.89% <0.00%> (-3.54%) |
:arrow_down: |
tsl/src/chunk_copy.c | 92.90% <0.00%> (-0.51%) |
:arrow_down: |
src/bgw/job.c | 93.04% <0.00%> (-0.39%) |
:arrow_down: |
src/cross_module_fn.c | 69.28% <0.00%> (-0.20%) |
:arrow_down: |
src/chunk.c | 94.30% <0.00%> (-0.14%) |
:arrow_down: |
src/process_utility.c | 94.53% <0.00%> (-0.06%) |
:arrow_down: |
... and 7 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 8bf6c88...b1e2687. Read the comment docs.
Everything integrated in other PRs.