timescaledb icon indicating copy to clipboard operation
timescaledb copied to clipboard

[Bug]: Self-reference check to add_data_node does not work. Deadlock still happens.

Open Sidicer opened this issue 2 years ago • 5 comments

What type of bug is this?

Crash, Locking issue

What subsystems and features are affected?

Access node, Multi-node

What happened?

If add_data_node is attempted with the same instance and database as the one the add_data_node is executed on, it will deadlock since a transaction is opened on the datanode which will block updates.

This is in reference to Add self-reference check to add_data_node #2144

While we noticed that the issue has been noticed and changes were commited to main branch two years ago - we still ran into the same issue after accidentally entering the access node hostname/address when running add_data_node()

Had to forcefully restart the docker container for the access node to continue working.

TimescaleDB version affected

2.7.0

PostgreSQL version used

14.3

What operating system did you use?

Debian GNU/Linux 11 (bullseye) Running Docker with Image: timescale/timescaledb:latest-pg14

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

Connect to any database with psql with postgres user.
Try to add a data node by using the access node hostname and port:
SELECT add_data_node('any-name','localhost');

EDIT: localhost actually produces the expected errors. Issue is when using an FQDN as per example:

postgres=# SELECT add_data_node('selfreferencetest','localhost');
NOTICE:  database "postgres" already exists on data node, skipping
NOTICE:  extension "timescaledb" already exists on data node, skipping
DETAIL:  TimescaleDB extension version on localhost:5432 was 2.7.0.
ERROR:  cannot add "selfreferencetest" as a data node
DETAIL:  ERROR:  node is already an access node

postgres=# SELECT add_data_node('selfreferencetest','<FQDN DNS hostname here>');
^CCancel request sent
^CCancel request sent
^CCancel request sent

Sidicer avatar Jun 20 '22 14:06 Sidicer

Hi @Sidicer thank you for reaching out. Attempting to reproduce this issue gave the expected errors and not a deadlock. Could you please provide the specific steps that you followed and ran into this? Thanks!

konskov avatar Jun 21 '22 08:06 konskov

@konskov Thank you for a quick reply. I mentioned "localhost" in the issue, which does not replicate the issue. I will update the original post with this additional information.

When using FQDN - it fails and deadlocks.

postgres=# SELECT add_data_node('selfreferencetest','localhost');
NOTICE:  database "postgres" already exists on data node, skipping
NOTICE:  extension "timescaledb" already exists on data node, skipping
DETAIL:  TimescaleDB extension version on localhost:5432 was 2.7.0.
ERROR:  cannot add "selfreferencetest" as a data node
DETAIL:  ERROR:  node is already an access node

postgres=# SELECT add_data_node('selfreferencetest','<FQDN DNS hostname here>');
^CCancel request sent
^CCancel request sent
^CCancel request sent

Sidicer avatar Jun 21 '22 09:06 Sidicer

@Sidicer note that no NOTICE messages have been omitted yet in the FQDN case. That suggests that there are connectivity issues while using FQDN. Can you please ping the FQDN from the access node and see if it works ok enough?

nikkhils avatar Jun 21 '22 10:06 nikkhils

Hello @Sidicer,

I tried to reproduce the problem on my system. Unfortunately, I was not able to reproduce the deadlock so far. I called add_data_node to add localhost, 127.0.0.1, and <FQDN> as data nodes. In all three cases, an error message was returned.

Might it be possible to check the DNS setup and the network connectivity by using ping as mentioned by @nikkhils ?

Test case - using localhost

test2=# SELECT add_data_node('any-name','localhost');
NOTICE:  database "test2" already exists on data node, skipping
NOTICE:  extension "timescaledb" already exists on data node, skipping
DETAIL:  TimescaleDB extension version on localhost:5432 was 2.7.0.
ERROR:  [any-name]: cannot add the current database as a data node to itself
DETAIL:  Adding the current database as a data node to itself would create a cycle. Use a different instance or database for the data node.
HINT:  Check that the 'port' parameter refers to a different instance or that the 'database' parameter refers to a different database.

Test case - using localhost IP

test2=# SELECT add_data_node('any-name','127.0.0.1');
NOTICE:  database "test2" already exists on data node, skipping
NOTICE:  extension "timescaledb" already exists on data node, skipping
DETAIL:  TimescaleDB extension version on 127.0.0.1:5432 was 2.7.0.
ERROR:  [any-name]: cannot add the current database as a data node to itself
DETAIL:  Adding the current database as a data node to itself would create a cycle. Use a different instance or database for the data node.
HINT:  Check that the 'port' parameter refers to a different instance or that the 'database' parameter refers to a different database.

Test case - using FQDN

test2=# SELECT add_data_node('any-name','debian11-work.home.local');
NOTICE:  database "test2" already exists on data node, skipping
NOTICE:  extension "timescaledb" already exists on data node, skipping
DETAIL:  TimescaleDB extension version on debian11-work.home.local:5432 was 2.7.0.
ERROR:  [any-name]: cannot add the current database as a data node to itself
DETAIL:  Adding the current database as a data node to itself would create a cycle. Use a different instance or database for the data node.
HINT:  Check that the 'port' parameter refers to a different instance or that the 'database' parameter refers to a different database.

jnidzwetzki avatar Jun 28 '22 10:06 jnidzwetzki

This issue has been automatically marked as stale due to lack of activity. You can remove the stale label or comment. Otherwise, this issue will be closed in 30 days. Thank you!

github-actions[bot] avatar Aug 28 '22 02:08 github-actions[bot]

Dear Author,

We are closing this issue due to lack of activity. Feel free to add a comment to this issue if you can provide more information and we will re-open it. Thank you!

github-actions[bot] avatar Sep 28 '22 02:09 github-actions[bot]