timescaledb icon indicating copy to clipboard operation
timescaledb copied to clipboard

[Flaky test] telemetry

Open mkindahl opened this issue 2 years ago • 9 comments

Which test is flaky?

telemetry

Since when has the test been flaky?

September 28, 2022

Link to the failed test run

https://github.com/timescale/timescaledb/actions/runs/3141452205/jobs/5104125322

Log output

diff -u /mnt/d/a/timescaledb/timescaledb/test/expected/telemetry.out /mnt/d/a/timescaledb/timescaledb/build_wsl/test/results/telemetry.out
--- /mnt/d/a/timescaledb/timescaledb/test/expected/telemetry.out	2022-09-28 07:19:44.909707400 +0000
+++ /mnt/d/a/timescaledb/timescaledb/build_wsl/test/results/telemetry.out	2022-09-28 07:27:31.898887000 +0000
@@ -72,7 +72,7 @@
 SELECT _timescaledb_internal.test_status(404);
 ERROR:  endpoint sent back unexpected HTTP status: 404
 SELECT _timescaledb_internal.test_status(500);
-ERROR:  endpoint sent back unexpected HTTP status: 500
+ERROR:  endpoint sent back unexpected HTTP status: 502
 SELECT _timescaledb_internal.test_status(503);
 ERROR:  endpoint sent back unexpected HTTP status: 503
 \set ON_ERROR_STOP 1

Reason for flakiness

External issue. It seems the test is dependent on a gateway that is not stable.

mkindahl avatar Sep 28 '22 07:09 mkindahl

Happens in https://github.com/timescale/timescaledb/actions/runs/3289466698/jobs/5421034074 as well.

diff -u /home/runner/work/timescaledb/timescaledb/test/expected/telemetry.out /home/runner/work/timescaledb/timescaledb/build/test/results/telemetry.out
--- /home/runner/work/timescaledb/timescaledb/test/expected/telemetry.out	2022-10-20 12:12:57.505307727 +0000
+++ /home/runner/work/timescaledb/timescaledb/build/test/results/telemetry.out	2022-10-20 12:13:49.249967166 +0000
@@ -38,7 +38,7 @@
 
 \set ON_ERROR_STOP 0
 SELECT _timescaledb_internal.test_status_ssl(304);
-ERROR:  endpoint sent back unexpected HTTP status: 304
+ERROR:  connection error: Operation now in progress
 SELECT _timescaledb_internal.test_status_ssl(400);
 ERROR:  endpoint sent back unexpected HTTP status: 400
 SELECT _timescaledb_internal.test_status_ssl(401);

mkindahl avatar Oct 20 '22 12:10 mkindahl

https://github.com/timescale/timescaledb/actions/runs/3894936096/jobs/6649677409

mkindahl avatar Jan 11 '23 21:01 mkindahl

https://github.com/timescale/timescaledb/actions/runs/4175600633/jobs/7230752287

diff -u /home/runner/work/timescaledb/timescaledb/test/expected/telemetry.out /home/runner/work/timescaledb/timescaledb/build/test/results/telemetry.out
--- /home/runner/work/timescaledb/timescaledb/test/expected/telemetry.out	2023-02-14 15:55:07.734285219 +0000
+++ /home/runner/work/timescaledb/timescaledb/build/test/results/telemetry.out	2023-02-14 16:01:35.772586411 +0000
@@ -42,7 +42,7 @@
 SELECT _timescaledb_internal.test_status_ssl(400);
 ERROR:  endpoint sent back unexpected HTTP status: 400
 SELECT _timescaledb_internal.test_status_ssl(401);
-ERROR:  endpoint sent back unexpected HTTP status: 401
+ERROR:  HTTP connection read error
 SELECT _timescaledb_internal.test_status_ssl(404);
 ERROR:  endpoint sent back unexpected HTTP status: 404
 SELECT _timescaledb_internal.test_status_ssl(500);

jchampio avatar Feb 14 '23 18:02 jchampio

https://github.com/timescale/timescaledb/actions/runs/4699537274/jobs/8333537914?pr=5558

diff -u /__w/timescaledb/timescaledb/test/expected/telemetry.out /__w/timescaledb/timescaledb/build/test/results/telemetry.out
--- /__w/timescaledb/timescaledb/test/expected/telemetry.out	2023-04-14 12:23:45.012853333 +0000
+++ /__w/timescaledb/timescaledb/build/test/results/telemetry.out	2023-04-14 12:36:01.899262040 +0000
@@ -66,7 +66,7 @@
 SELECT _timescaledb_internal.test_status(304);
 ERROR:  endpoint sent back unexpected HTTP status: 304
 SELECT _timescaledb_internal.test_status(400);
-ERROR:  endpoint sent back unexpected HTTP status: 400
+ERROR:  HTTP connection read error
 SELECT _timescaledb_internal.test_status(401);
 ERROR:  endpoint sent back unexpected HTTP status: 401
 SELECT _timescaledb_internal.test_status(404);

mkindahl avatar Apr 14 '23 13:04 mkindahl

https://github.com/timescale/timescaledb/actions/runs/4699537254/jobs/8333704481?pr=5558

mkindahl avatar Apr 14 '23 13:04 mkindahl

https://github.com/timescale/timescaledb/actions/runs/4808681205/jobs/8558964934?pr=5614

mkindahl avatar Apr 26 '23 15:04 mkindahl

Had to ignore it in main to be able to merge PRs https://github.com/timescale/timescaledb/commit/29154b29d11901ee4b98312ca6245e87216a5877

akuzm avatar May 25 '23 09:05 akuzm

I was investigating this a bit yesterday; I've seen

  • a few HTTP connection read error - from the handling code that should only happen in case the remote end is closing the connection unexpectedly...
  • there were some 502 -s as well - the test might be either hitting a rate limit or something is not right with postman-echo.com

kgyrtkirk avatar May 25 '23 11:05 kgyrtkirk

Fixed partially by #6464

If 502 issue occurs, request is retried.

Adding a retry on http connection error should resolve this issue.

antekresic avatar Dec 26 '23 07:12 antekresic