timescaledb [Bug]: Throughput drop using Distributed hyptertable

What type of bug is this?

Performance issue

What subsystems and features are affected?

Data ingestion, Distributed hypertable, Multi-node

What happened?

Hello!

Currently I am using one TimescaleDB (v2.5.0-pg12) instance, running on self-hosted box as docker container. In my case there is a need to implement multi-node TimescaleDB for better management and getting performance increase as well. I have created 1 access node and 2 data nodes. All the docker containers (1an + 2dn) are running on separate self-hosted boxes. Access node: Docker container memory limit 16GB, environment variables TS_TUNE_MEMORY=14GB TS_TUNE_NUM_CPUS=6 Data nodes each: Docker container memory limit 14GB, environment variables TS_TUNE_MEMORY=13GB TS_TUNE_NUM_CPUS=4

File transfer speed between nodes is 90-95mbytes/s.

Our applications generates data, which we put via INSERT query into TimescaleDB. I've added small comparison table (timescale_node_stats), where used one TimescaleDB instance and multi-node TimescaleDB. You can find, that you should run application twice longer with multi-node approach comparing with one instance Timescale go get ~amount of inserted data.

I also added postgres monitoring for running Timescale containers, which metrics you can find below. timescale_node_stats

Simple node + Access node: timescale_performance_1 timescale_performance_2

1st Data node: timescale_performance_datanode_1_1

2nd Data node: timescale_performance_datanode_2_1

The problem is that I am not sure where the bottleneck is, and how to get at least the same generated data amount using multi-node approach comparing to one instance. Theoretically it multi-node should work faster, but in my case I have a problem with throughput somewhere.

TimescaleDB version affected

2.6.1

PostgreSQL version used

12

What operating system did you use?

Alpine 3.15.4

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

No response

May 13 '22 20:05 sentinel09

@sentinel09 To analyze this issue you are experiencing we need to know more about your setup. The number of data nodes, how you created your distributed hypertable(s) and how you insert data.

May 23 '22 13:05 erimatnor

Hello @erimatnor, I have 2 data nodes. Distributed table was created with next flow. Create table:

CREATE TABLE my_events (
     insert_timestamp   timestamp   DEFAULT CURRENT_TIMESTAMP,
     event_timestamp    timestamp    NULL,
     guid    VARCHAR(255)    NULL,
     hash    VARCHAR(255)    NULL,
     lineage    TEXT    NULL,
     type    VARCHAR(255)    NULL,
     name    VARCHAR(255)    NULL,
     record    VARCHAR(255)    NULL,
     source    VARCHAR(255)    NULL,
     backbone    VARCHAR(255)    NULL,
     headers    TEXT    NULL,
     message    TEXT    NULL);

Create distributed hypertable:

SELECT create_distributed_hypertable('my_events', 'event_timestamp', 'source', chunk_time_interval => INTERVAL '5 minutes');

Source has 3 values: source1, source2 and source3, but I've been testing with 2 sources (source1 and source2) due to lack of resources at this moment. Insert data happens with INSERT INTO statements.

May 31 '22 22:05 sentinel09

@sentinel09 how many clients are doing inserts into the multi-node setup? It will help to use multiple clients to do inserts using unique non-conflicting rows.

Also, are you running INSERTS uses batches or one insert per transaction? It's recommended to use one transaction block for multiple inserts since singleton insert transaction will be slower in multi-node.

Jun 06 '22 11:06 nikkhils

@nikkhils during the test I had 10 running clients (we have max of 16 clients).

We are doing batches of ~50,000

Jun 07 '22 09:06 sentinel09

Hello, guys! Any suggestions?

Jul 01 '22 14:07 sentinel09

We're working on multinode insert/copy performance so probably it will get better after a release or two (2.8 or 2.9).

What I can suggest for now:

update to 2.7 since it had some INSERT performance improvements already
set timescaledb.max_open_chunks_per_insert and timescaledb.max_cached_chunks_per_hypertable to, say, 256, on access node and all data nodes.
try COPY instead of INSERT, it's slightly faster.
try to sort rows in your batches by time before inserting.

Jul 04 '22 09:07 akuzm

Hello, Thanks! Tested with a newer version of Timescale, but it did not make big difference. Will test with provided configs, as well as discuss with developer COPY implementation instead of INSERT.

Jul 19 '22 17:07 sentinel09

We merged some distributed COPY improvements into 2.9 which is going to be our next release. This issue is about INSERT, so keeping it open for now.

Nov 16 '22 16:11 akuzm

@sentinel09 I'm the Product Manager for Multi-node, and I'd like to speak with you about your experience working with Multi-node as we are prioritizing features for our roadmap. If you are available for a quick call, please reach out to me on our Community Slack (Gina Alioto) or via email ([email protected]) - Many thanks!

Nov 18 '22 12:11 gina-alioto

@sentinel09 we already released the 2.9 version so did u have a chance to check it out as suggested by @akuzm ?? https://github.com/timescale/timescaledb/releases

If not I suggest you to upgrade to 2.9.1.

Jan 06 '23 19:01 fabriziomello

Closing this issue since improvements have been made. @sentinel09, feel free to post additional updates to let us know if there have been improvements for you. However, not that multi-node is now deprecated.

Jan 30 '24 15:01 erimatnor

timescaledb timescaledb copied to clipboard

[Bug]: Throughput drop using Distributed hyptertable

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

timescaledb
timescaledb copied to clipboard