timescaledb icon indicating copy to clipboard operation
timescaledb copied to clipboard

[Bug] Telemetry related timeout issues when stopping postgresql

Open lafrech opened this issue 2 years ago • 1 comments

What type of bug is this?

Locking issue

What subsystems and features are affected?

Telemetry

What happened?

I use pytest-postgresql and I often get this error while running the tests.

subprocess.CalledProcessError: Command '/usr/lib/postgresql/13/bin/pg_ctl stop -D /tmp/pytest-of-jerome/pytest-47/pytest-postgresql-postgresql_proc0/data-16938 -m f' returned non-zero exit status 1.

It happens after the last test of the test suite, whatever the order of the tests. The tests run fine but the last test returns an error after some time.

I compared logs for both OK and KO tests and I suspect it could be related to telemetry.

At the end of the KO test, I see those multiple lines:

2022-01-27 13:31:53.561 CET [162345] LOG:  terminating TimescaleDB background job "Telemetry Reporter [1]" due to administrator command
2022-01-27 13:31:53.561 CET [162889] LOG:  terminating TimescaleDB background job "Telemetry Reporter [1]" due to administrator command
2022-01-27 13:31:53.561 CET [162661] LOG:  terminating TimescaleDB background job "Telemetry Reporter [1]" due to administrator command

I disabled telemetry in the test suite configuration and it doesn't seem to happen again.

BTW, when telemetry is activated, I see those kind of messages:

2022-01-27 13:31:37.689 CET [161301] ATTENTION:  telemetry could not connect to "telemetry.timescale.com"

Indeed

$ ping telemetry.timescale.com
PING a5fa811f6ebb111e99ee60e05265324a-1692108274.us-east-1.elb.amazonaws.com (52.203.163.145) 56(84) bytes of data.
^C
--- a5fa811f6ebb111e99ee60e05265324a-1692108274.us-east-1.elb.amazonaws.com ping statistics ---
13 packets transmitted, 0 received, 100% packet loss, time 12279ms

I suppose this error is transient and it could be involved in the timeout issue at the end of the tests.

In case it can be useful to anyone, my pytest config is now

from pytest_postgresql import factories as ppf

postgresql_proc = ppf.postgresql_proc(
    postgres_options=(
        "-c shared_preload_libraries='timescaledb' "
        "-c timescaledb.telemetry_level=off"
    )
)
postgresql = ppf.postgresql("postgresql_proc")

TimescaleDB version affected

2.5.1

PostgreSQL version used

13.5

What operating system did you use?

Debian 11.2

What installation method did you use?

Deb/Apt

What platform did you run on?

Not applicable

Relevant log output and stack trace

No response

How can we reproduce the bug?

No response

lafrech avatar Jan 27 '22 13:01 lafrech

Hi @lafrech thanks for submitting the issue. We believe it was just a intermittent issue with the telemetry server. However, we should make the telemetry reporting component more error tolerant.

horzsolt avatar Feb 24 '22 08:02 horzsolt

Dear Author,

This issue has been automatically marked as stale due to lack of activity. With only the issue description that is currently provided, we do not have enough information to take action. If you have or find the answers we would need, please reach out. Otherwise, this issue will be closed in 30 days. Thank you!

github-actions[bot] avatar Sep 25 '22 02:09 github-actions[bot]

we should make the telemetry reporting component more error tolerant.

Yes, this is the right fix. The telemetry server working correctly just hides the underlying problem.

RafiaSabih added the need-more-info label on 26 Jul

Is there any other info I should provide?

lafrech avatar Sep 26 '22 07:09 lafrech