java-driver icon indicating copy to clipboard operation
java-driver copied to clipboard

signalConnectionClosed() failing on `assert remaining >= 0`

Open fruch opened this issue 1 year ago • 25 comments

Issue description

  • [ ] This issue is a regression.
  • [x] It is unknown if this issue is a regression.

at the end of c-s we are seeing the following error:

java.lang.AssertionError
	at com.datastax.driver.core.ConvictionPolicy$DefaultConvictionPolicy.signalConnectionClosed(ConvictionPolicy.java:90)
	at com.datastax.driver.core.Connection.closeAsync(Connection.java:1095)
	at com.datastax.driver.core.HostConnectionPool.discardAvailableConnections(HostConnectionPool.java:1011)
	at com.datastax.driver.core.HostConnectionPool.closeAsync(HostConnectionPool.java:972)
	at com.datastax.driver.core.SessionManager.closeAsync(SessionManager.java:196)
	at com.datastax.driver.core.Cluster$Manager.close(Cluster.java:2067)
	at com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:1636)
	at com.datastax.driver.core.Cluster.closeAsync(Cluster.java:626)
	at com.datastax.driver.core.Cluster.close(Cluster.java:637)
	at org.apache.cassandra.stress.util.JavaDriverClient.disconnect(JavaDriverClient.java:262)
	at org.apache.cassandra.stress.settings.StressSettings.disconnect(StressSettings.java:394)
	at org.apache.cassandra.stress.StressAction.run(StressAction.java:98)
	at org.apache.cassandra.stress.Stress.run(Stress.java:143)
	at org.apache.cassandra.stress.Stress.main(Stress.java:62)

seems like coming from: https://github.com/scylladb/java-driver/blame/b3f3ebaf161b21e5c4840ec294595d4e4b39d9bf/driver-core/src/main/java/com/datastax/driver/core/ConvictionPolicy.java#L90

Impact

It makes SCT confuse, and fail to read the summery of c-s

How frequently does it reproduce?

we don't have a specific way to reproduce it

Installation details

Kernel Version: 5.15.0-1035-azure Scylla version (or git commit hash): 5.2.0~rc4-20230402.d70751fee3f9 with build-id 80951fe7ff3c6e2c268211c71a9236071ac18a35

Cluster size: 6 nodes (Standard_L8s_v3)

Scylla Nodes used in this run:

  • longevity-10gb-3h-5-2-db-node-446a8791-eastus-8 (172.173.226.133 | 10.0.0.10) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-446a8791-eastus-7 (20.115.35.75 | 10.0.0.9) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-446a8791-eastus-6 (20.169.164.176 | 10.0.0.10) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-446a8791-eastus-5 (20.169.164.162 | 10.0.0.9) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-446a8791-eastus-4 (172.174.45.124 | 10.0.0.8) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-446a8791-eastus-3 (172.174.44.252 | 10.0.0.7) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-446a8791-eastus-2 (172.174.44.130 | 10.0.0.6) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-446a8791-eastus-1 (172.174.45.65 | 10.0.0.5) (shards: 7)

OS / Image: /subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/scylla-images/providers/Microsoft.Compute/images/scylla-5.2.0-rc4-x86_64-2023-04-03T01-32-26 (azure: eastus)

Test: longevity-10gb-3h-azure-test Test id: 446a8791-c9b3-4b83-b287-c39203f80216 Test name: scylla-5.2/longevity/longevity-10gb-3h-azure-test Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 446a8791-c9b3-4b83-b287-c39203f80216
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 446a8791-c9b3-4b83-b287-c39203f80216

Logs:

Jenkins job URL

fruch avatar Apr 18 '23 15:04 fruch

@avelanarius, any assumption of what would cause such assertion to fail ?

fruch avatar Apr 19 '23 14:04 fruch

I don't have any at the moment, @Lorak-mmk will look into this issue.

avelanarius avatar Apr 19 '23 14:04 avelanarius

@avelanarius @Lorak-mmk

we run into it again:

Installation details

Kernel Version: 5.15.0-1040-azure Scylla version (or git commit hash): 5.2.3-20230608.ea08d409f155 with build-id ec8d1c19fc354f34c19e07e35880e0f40cc7d8cd

Cluster size: 6 nodes (Standard_L8s_v3)

Scylla Nodes used in this run:

  • longevity-10gb-3h-5-2-db-node-9c9bf09e-eastus-9 (74.235.168.59 | 10.0.0.8) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-9c9bf09e-eastus-8 (23.101.133.151 | 10.0.0.9) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-9c9bf09e-eastus-7 (20.121.192.18 | 10.0.0.14) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-9c9bf09e-eastus-6 (20.172.134.170 | 10.0.0.10) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-9c9bf09e-eastus-5 (172.171.220.100 | 10.0.0.9) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-9c9bf09e-eastus-4 (20.231.56.163 | 10.0.0.8) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-9c9bf09e-eastus-3 (20.124.244.146 | 10.0.0.7) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-9c9bf09e-eastus-2 (20.124.243.147 | 10.0.0.6) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-9c9bf09e-eastus-11 (20.169.162.58 | 10.0.0.8) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-9c9bf09e-eastus-10 (13.68.236.250 | 10.0.0.7) (shards: 7)
  • longevity-10gb-3h-5-2-db-node-9c9bf09e-eastus-1 (20.124.243.124 | 10.0.0.5) (shards: 7)

OS / Image: /subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/scylla-images/providers/Microsoft.Compute/images/scylla-5.2.3-x86_64-2023-06-19T09-07-22 (azure: eastus)

Test: longevity-10gb-3h-azure-test Test id: 9c9bf09e-e825-4d63-a75f-0ae2b27345b4 Test name: scylla-5.2/longevity/longevity-10gb-3h-azure-test Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 9c9bf09e-e825-4d63-a75f-0ae2b27345b4
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 9c9bf09e-e825-4d63-a75f-0ae2b27345b4

Logs:

Jenkins job URL Argus

fruch avatar Jun 19 '23 14:06 fruch

@avelanarius any progress here?

DoronArazii avatar Jul 26 '23 08:07 DoronArazii

Issue description

While both stress threads were overall healthy throughout the run, they fail exactly at the end of the test:

total,     650033460,   60121,   60121,   60121,    16.6,    11.3,    48.8,    83.1,   150.1,   289.1,10790.0,  0.00143,      0,      0,       0,       0,       0,       0
total,     650338896,   61087,   61087,   61087,    16.4,     9.0,    58.3,    98.4,   155.8,   234.1,10795.0,  0.00143,      0,      0,       0,       0,       0,       0
total,     650643558,   60932,   60932,   60932,    16.4,     8.2,    59.0,    99.3,   148.0,   197.4,10800.0,  0.00142,      0,      0,       0,       0,       0,       0
total,     650667067,   56268,   56268,   56268,    17.3,    14.2,    43.0,    58.6,    86.7,   110.6,10800.4,  0.00143,      0,      0,       0,       0,       0,       0


Results:
Op rate                   :   60,245 op/s  [WRITE: 60,245 op/s]
Partition rate            :   60,245 pk/s  [WRITE: 60,245 pk/s]
Row rate                  :   60,245 row/s [WRITE: 60,245 row/s]
Latency mean              :   16.6 ms [WRITE: 16.6 ms]
Latency median            :   10.0 ms [WRITE: 10.0 ms]
Latency 95th percentile   :   54.0 ms [WRITE: 54.0 ms]
Latency 99th percentile   :   92.1 ms [WRITE: 92.1 ms]
Latency 99.9th percentile :  149.0 ms [WRITE: 149.0 ms]
Latency max               : 8535.4 ms [WRITE: 8,535.4 ms]
Total partitions          : 650,667,067 [WRITE: 650,667,067]
Total errors              :          0 [WRITE: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 03:00:00

java.lang.AssertionError
END
        at com.datastax.driver.core.ConvictionPolicy$DefaultConvictionPolicy.signalConnectionClosed(ConvictionPolicy.java:90)
        at com.datastax.driver.core.Connection.closeAsync(Connection.java:1095)
        at com.datastax.driver.core.HostConnectionPool.discardAvailableConnections(HostConnectionPool.java:1011)
        at com.datastax.driver.core.HostConnectionPool.closeAsync(HostConnectionPool.java:972)
        at com.datastax.driver.core.SessionManager.closeAsync(SessionManager.java:196)
        at com.datastax.driver.core.Cluster$Manager.close(Cluster.java:2067)
        at com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:1636)
        at com.datastax.driver.core.Cluster.closeAsync(Cluster.java:626)
        at com.datastax.driver.core.Cluster.close(Cluster.java:637)
        at org.apache.cassandra.stress.util.JavaDriverClient.disconnect(JavaDriverClient.java:262)
        at org.apache.cassandra.stress.settings.StressSettings.disconnect(StressSettings.java:394)
        at org.apache.cassandra.stress.StressAction.run(StressAction.java:98)
        at org.apache.cassandra.stress.Stress.run(Stress.java:143)
        at org.apache.cassandra.stress.Stress.main(Stress.java:62)
total,     704367012,   65463,   65463,   65463,    15.3,    10.2,    46.7,    78.5,   125.4,   185.7,10795.0,  0.00139,      0,      0,       0,       0,       0,       0
total,     704704017,   67401,   67401,   67401,    14.8,     9.7,    45.8,    79.8,   118.4,   194.6,10800.0,  0.00139,      0,      0,       0,       0,       0,       0
total,     704768936,   63638,   63638,   63638,    15.6,    10.1,    50.2,    79.4,   116.3,   167.0,10801.0,  0.00139,      0,      0,       0,       0,       0,       0


Results:
Op rate                   :   65,250 op/s  [WRITE: 65,250 op/s]
java.lang.AssertionError
Partition rate            :   65,250 pk/s  [WRITE: 65,250 pk/s]
        at com.datastax.driver.core.ConvictionPolicy$DefaultConvictionPolicy.signalConnectionClosed(ConvictionPolicy.java:90)
Row rate                  :   65,250 row/s [WRITE: 65,250 row/s]
        at com.datastax.driver.core.Connection.closeAsync(Connection.java:1095)
Latency mean              :   15.3 ms [WRITE: 15.3 ms]
        at com.datastax.driver.core.HostConnectionPool.discardAvailableConnections(HostConnectionPool.java:1011)
Latency median            :    9.2 ms [WRITE: 9.2 ms]
        at com.datastax.driver.core.HostConnectionPool.closeAsync(HostConnectionPool.java:972)
Latency 95th percentile   :   49.7 ms [WRITE: 49.7 ms]
        at com.datastax.driver.core.SessionManager.closeAsync(SessionManager.java:196)
Latency 99th percentile   :   86.6 ms [WRITE: 86.6 ms]
        at com.datastax.driver.core.Cluster$Manager.close(Cluster.java:2067)
Latency 99.9th percentile :  142.1 ms [WRITE: 142.1 ms]
        at com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:1636)
Latency max               : 8548.0 ms [WRITE: 8,548.0 ms]
        at com.datastax.driver.core.Cluster.closeAsync(Cluster.java:626)
Total partitions          : 704,768,936 [WRITE: 704,768,936]
        at com.datastax.driver.core.Cluster.close(Cluster.java:637)
Total errors              :          0 [WRITE: 0]
        at org.apache.cassandra.stress.util.JavaDriverClient.disconnect(JavaDriverClient.java:262)
Total GC count            : 0
        at org.apache.cassandra.stress.settings.StressSettings.disconnect(StressSettings.java:394)
Total GC memory           : 0.000 KiB
        at org.apache.cassandra.stress.StressAction.run(StressAction.java:98)
Total GC time             :    0.0 seconds
        at org.apache.cassandra.stress.Stress.run(Stress.java:143)
Avg GC time               :    NaN ms
        at org.apache.cassandra.stress.Stress.main(Stress.java:62)
StdDev GC time            :    0.0 ms
Total operation time      : 03:00:01

END

How frequently does it reproduce?

happened in both log-running stress threads in the run.

Installation details

Kernel Version: 5.15.0-1042-azure Scylla version (or git commit hash): 2023.1.0~rc8-20230731.b6f7c5a6910c with build-id f6e718548e76ccf3564ed2387b6582ba8d37793c

Cluster size: 6 nodes (Standard_L8s_v3)

Scylla Nodes used in this run:

  • longevity-10gb-3h-2023-1-db-node-bcc0441d-eastus-9 (20.121.32.63 | 10.0.0.8) (shards: 7)
  • longevity-10gb-3h-2023-1-db-node-bcc0441d-eastus-8 (20.127.14.57 | 10.0.0.9) (shards: 7)
  • longevity-10gb-3h-2023-1-db-node-bcc0441d-eastus-7 (20.172.150.249 | 10.0.0.14) (shards: 7)
  • longevity-10gb-3h-2023-1-db-node-bcc0441d-eastus-6 (172.178.17.8 | 10.0.0.10) (shards: 7)
  • longevity-10gb-3h-2023-1-db-node-bcc0441d-eastus-5 (172.178.16.70 | 10.0.0.9) (shards: 7)
  • longevity-10gb-3h-2023-1-db-node-bcc0441d-eastus-4 (74.235.172.235 | 10.0.0.8) (shards: 7)
  • longevity-10gb-3h-2023-1-db-node-bcc0441d-eastus-3 (74.235.172.33 | 10.0.0.7) (shards: 7)
  • longevity-10gb-3h-2023-1-db-node-bcc0441d-eastus-2 (74.235.77.246 | 10.0.0.6) (shards: 7)
  • longevity-10gb-3h-2023-1-db-node-bcc0441d-eastus-11 (20.185.226.14 | 10.0.0.7) (shards: 7)
  • longevity-10gb-3h-2023-1-db-node-bcc0441d-eastus-10 (172.190.170.97 | 10.0.0.10) (shards: 7)
  • longevity-10gb-3h-2023-1-db-node-bcc0441d-eastus-1 (20.172.145.155 | 10.0.0.5) (shards: 7)

OS / Image: /subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/SCYLLA-IMAGES/providers/Microsoft.Compute/images/scylla-2023.1.0-rc8-x86_64-2023-07-31T21-30-24 (azure: eastus)

Test: longevity-10gb-3h-azure-test Test id: bcc0441d-5d7a-42d5-bb79-9e2870975688 Test name: enterprise-2023.1/longevity/longevity-10gb-3h-azure-test Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor bcc0441d-5d7a-42d5-bb79-9e2870975688
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs bcc0441d-5d7a-42d5-bb79-9e2870975688

Logs:

Jenkins job URL Argus

ShlomiBalalis avatar Aug 07 '23 15:08 ShlomiBalalis

@ShlomiBalalis - which version of the Java driver are you using? And if they fail at the end of the test, what's the user impact? And what do we see on the nodes' logs?

mykaul avatar Aug 08 '23 08:08 mykaul

as written in the original report

It makes SCT confuse, and fail to read the summery of c-s

it make us need to take a close look why the stress command failed, it wastes our time. and for sure gonna confuse any user of that driver. it has no effect on scylla.

the version in 2023.1 branch is 3.11.2.4

fruch avatar Aug 08 '23 08:08 fruch

the version in 2023.1 branch is 3.11.2.4

Why not upgrade to 3.11.2.5? The list of changes (https://github.com/scylladb/java-driver/compare/3.11.2.4...3.11.2.5 ) is significant. Nothing that I see that may solve this issue, though.

mykaul avatar Aug 08 '23 09:08 mykaul

the version in 2023.1 branch is 3.11.2.4

Why not upgrade to 3.11.2.5? The list of changes (3.11.2.4...3.11.2.5 ) is significant. Nothing that I see that may solve this issue, though.

well same as any other change to a release we need a reason for backporting anything, so far there isn't any as you noticed.

if there would be a fix for this issue in the next driver release, that would be a good reason to backport it to older releases.

fruch avatar Aug 08 '23 09:08 fruch

@fruch - I don't fully understand - by not upgrading, we are not testing, at least the following: https://github.com/scylladb/java-driver/commit/3e2d8a1766150d78bd806264ecf1e1870e0f14cf https://github.com/scylladb/java-driver/commit/bb2fcdc22384b40194becdb994906fa3a6eb0940 https://github.com/scylladb/java-driver/commit/376f03252bbee7c220aeb4f5460a55a92944b00a (and this one being the most important of them) Not to mention other stuff.

mykaul avatar Aug 08 '23 09:08 mykaul

@fruch - I don't fully understand - by not upgrading, we are not testing, at least the following: 3e2d8a1 bb2fcdc 376f032 (and this one being the most important of them) Not to mention other stuff.

we are gonna be testing those first on master, like any other feature. and if it's deems it needs more testing ontop of older ongoing release, we'll then backport both the new driver and the relevant tests for it.

backporting it right now, doesn't mean the new feature would be test as part of the release. anyhow when it comes to drivers the horses are out as they get released.

fruch avatar Aug 08 '23 10:08 fruch

the version in 2023.1 branch is 3.11.2.4

Why not upgrade to 3.11.2.5? The list of changes (3.11.2.4...3.11.2.5 ) is significant. Nothing that I see that may solve this issue, though.

I think this discussion is besides the point of the issue, if upgrading won't solve it.

ShlomiBalalis avatar Aug 08 '23 10:08 ShlomiBalalis

The way forward to debug this issue is probably to enable more logging, especially log prints like this: https://github.com/scylladb/java-driver/blob/d291df6b35f7903c0b2d935754aebcb5b35bcd81/driver-core/src/main/java/com/datastax/driver/core/ConvictionPolicy.java#L82-L83

In the follow-up message (or PR?), I'll write how to configure the logging framework to log this specific message (we don't want to enable all DEBUG logs, as this would spam the logs too much).

avelanarius avatar Aug 09 '23 12:08 avelanarius

it's still reproduces on master runs on Azure, not 100% of the time, but at least twice a week...

Installation details

Kernel Version: 5.15.0-1044-azure Scylla version (or git commit hash): 5.4.0~dev-20230824.93be4c0cb0f0 with build-id 9e29ac9d5d351d94023b4d80a71e21172f311f9d

Cluster size: 6 nodes (Standard_L8s_v3)

Scylla Nodes used in this run:

  • longevity-10gb-3h-master-db-node-3d0a838c-eastus-7 (23.96.110.26 | 10.0.0.5) (shards: 7)
  • longevity-10gb-3h-master-db-node-3d0a838c-eastus-6 (172.173.138.168 | 10.0.0.10) (shards: 7)
  • longevity-10gb-3h-master-db-node-3d0a838c-eastus-5 (172.173.138.74 | 10.0.0.9) (shards: 7)
  • longevity-10gb-3h-master-db-node-3d0a838c-eastus-4 (172.173.138.21 | 10.0.0.8) (shards: 7)
  • longevity-10gb-3h-master-db-node-3d0a838c-eastus-3 (172.173.138.15 | 10.0.0.7) (shards: 7)
  • longevity-10gb-3h-master-db-node-3d0a838c-eastus-2 (172.173.136.93 | 10.0.0.6) (shards: 7)
  • longevity-10gb-3h-master-db-node-3d0a838c-eastus-1 (172.173.136.2 | 10.0.0.5) (shards: 7)

OS / Image: /subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/scylla-images/providers/Microsoft.Compute/images/scylla-5.4.0-dev-x86_64-2023-08-28T12-54-56 (azure: undefined_region)

Test: longevity-10gb-3h-azure-test Test id: 3d0a838c-869d-4b73-a0b5-0be75eae9559 Test name: scylla-master/longevity/longevity-10gb-3h-azure-test Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 3d0a838c-869d-4b73-a0b5-0be75eae9559
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 3d0a838c-869d-4b73-a0b5-0be75eae9559

Logs:

Jenkins job URL Argus

fruch avatar Aug 30 '23 13:08 fruch

@fruch as I know we keep hitting it in Azure for master. IIUC, this time it's with scylla-driver-core-3.11.2.5-shaded.jar.

roydahan avatar Sep 19 '23 16:09 roydahan

Running K8S MultiDC CI job with 6 Scylla pods/nodes (3 in each of the 2 regions) we hit this bug in about 50% cases. It doesn't get hit running single DC with 3 nodes. The Scylla docker image used for running cassandra-stress is 5.2.7.

Impact

False error events.

How frequently does it reproduce?

~50%

Installation details

Kernel Version: 5.10.198-187.748.amzn2.x86_64 Scylla version (or git commit hash): 2023.1.2-20231001.646df23cc4b3 with build-id 367fcf1672d44f5cbddc88f946cf272e2551b85a

Operator Image: scylladb/scylla-operator:latest Operator Helm Version: v1.12.0-alpha.0-123-g24389ae Operator Helm Repository: https://storage.googleapis.com/scylla-operator-charts/latest Cluster size: 4 nodes (i4i.4xlarge) | 3 Scylla pods

Scylla Nodes used in this run: No resources left at the end of the run

OS / Image: (k8s-eks: eu-north-1, eu-west-1 )

Test: longevity-scylla-operator-multidc-12h-eks Test id: 61d76257-c01e-4e92-8908-682a75d4e7fb Test name: scylla-operator/operator-master/eks/longevity-scylla-operator-multidc-12h-eks Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 61d76257-c01e-4e92-8908-682a75d4e7fb
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 61d76257-c01e-4e92-8908-682a75d4e7fb

Logs:

Jenkins job URL Argus

vponomaryov avatar Nov 30 '23 15:11 vponomaryov

Happening again on the multi-dc k8s run

Installation details

Kernel Version: 5.10.199-190.747.amzn2.x86_64 Scylla version (or git commit hash): 2023.1.2-20231001.646df23cc4b3 with build-id 367fcf1672d44f5cbddc88f946cf272e2551b85a

Operator Image: scylladb/scylla-operator:latest Operator Helm Version: v1.12.0-alpha.0-144-g60f7824 Operator Helm Repository: https://storage.googleapis.com/scylla-operator-charts/latest Cluster size: 4 nodes (i4i.4xlarge)

Scylla Nodes used in this run: No resources left at the end of the run

OS / Image: `` (k8s-eks: undefined_region)

Test: longevity-scylla-operator-multidc-12h-eks Test id: 6c7d144e-bab4-4ad7-b3f2-8cd2c96422cb Test name: scylla-operator/operator-master/eks/longevity-scylla-operator-multidc-12h-eks Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 6c7d144e-bab4-4ad7-b3f2-8cd2c96422cb
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 6c7d144e-bab4-4ad7-b3f2-8cd2c96422cb

Logs:

Jenkins job URL Argus

fruch avatar Dec 17 '23 14:12 fruch

And again,

@avelanarius

In the follow-up message (or PR?), I'll write how to configure the logging framework to log this specific message (we don't want to enable all DEBUG logs, as this would spam the logs too much).

can someone take look at this one ? and supply what's needed to debug this issue, and get it solve (it's since April, that we reported it first time)

Installation details

Kernel Version: 5.10.199-190.747.amzn2.x86_64 Scylla version (or git commit hash): 2023.1.2-20231001.646df23cc4b3 with build-id 367fcf1672d44f5cbddc88f946cf272e2551b85a

Operator Image: scylladb/scylla-operator:latest Operator Helm Version: v1.12.0-alpha.0-144-g60f7824 Operator Helm Repository: https://storage.googleapis.com/scylla-operator-charts/latest Cluster size: 4 nodes (i4i.4xlarge)

Scylla Nodes used in this run: No resources left at the end of the run

OS / Image: `` (k8s-eks: undefined_region)

Test: longevity-scylla-operator-3h-multitenant-eks Test id: c9358794-630c-4607-9f59-ef831e22eb7d Test name: scylla-operator/operator-master/eks/longevity-scylla-operator-3h-multitenant-eks Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor c9358794-630c-4607-9f59-ef831e22eb7d
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs c9358794-630c-4607-9f59-ef831e22eb7d

Logs:

Jenkins job URL Argus

fruch avatar Dec 20 '23 14:12 fruch

happened again on weekly k8s run:

Installation details

Kernel Version: 5.10.201-191.748.amzn2.x86_64 Scylla version (or git commit hash): 2023.1.2-20231001.646df23cc4b3 with build-id 367fcf1672d44f5cbddc88f946cf272e2551b85a

Operator Image: scylladb/scylla-operator:latest Operator Helm Version: v1.12.0-alpha.0-144-g60f7824 Operator Helm Repository: https://storage.googleapis.com/scylla-operator-charts/latest Cluster size: 4 nodes (i4i.4xlarge)

Scylla Nodes used in this run: No resources left at the end of the run

OS / Image: `` (k8s-eks: undefined_region)

Test: longevity-scylla-operator-3h-multitenant-eks Test id: cdf68a9d-3688-4538-816c-8edc1641b191 Test name: scylla-operator/operator-master/eks/longevity-scylla-operator-3h-multitenant-eks Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor cdf68a9d-3688-4538-816c-8edc1641b191
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs cdf68a9d-3688-4538-816c-8edc1641b191

Logs:

Jenkins job URL Argus

fruch avatar Jan 08 '24 22:01 fruch

Happened again during longevity-schema-topology-changes-12h-test

Issue description

  • [ ] This issue is a regression.
  • [ ] It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Kernel Version: 5.15.0-1051-aws Scylla version (or git commit hash): 2023.1.4-20240112.12c616e7f0cf with build-id e7263a4aa92cf866b98cf680bd68d7198c9690c0

Cluster size: 5 nodes (i4i.2xlarge)

Scylla Nodes used in this run:

  • longevity-parallel-topology-schema--db-node-d0e85230-9 (18.212.238.102 | 10.12.11.200) (shards: -1)
  • longevity-parallel-topology-schema--db-node-d0e85230-8 (54.163.56.74 | 10.12.11.44) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-7 (44.222.212.99 | 10.12.10.216) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-6 (34.227.221.146 | 10.12.10.171) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-5 (52.91.217.170 | 10.12.9.24) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-4 (54.197.100.149 | 10.12.9.88) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-3 (54.90.67.168 | 10.12.8.85) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-26 (54.227.158.23 | 10.12.8.143) (shards: -1)
  • longevity-parallel-topology-schema--db-node-d0e85230-25 (52.204.184.227 | 10.12.8.71) (shards: -1)
  • longevity-parallel-topology-schema--db-node-d0e85230-24 (54.234.237.6 | 10.12.9.61) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-23 (100.24.68.249 | 10.12.8.59) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-22 (3.90.252.67 | 10.12.11.192) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-21 (54.91.47.243 | 10.12.10.139) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-20 (54.90.254.134 | 10.12.10.184) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-2 (18.207.113.162 | 10.12.9.20) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-19 (3.91.190.56 | 10.12.11.8) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-18 (54.209.164.7 | 10.12.9.218) (shards: -1)
  • longevity-parallel-topology-schema--db-node-d0e85230-17 (54.196.165.104 | 10.12.8.255) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-16 (54.160.194.119 | 10.12.10.7) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-15 (54.198.48.129 | 10.12.11.241) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-14 (54.173.31.94 | 10.12.9.251) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-13 (23.22.226.155 | 10.12.9.55) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-12 (54.85.55.103 | 10.12.11.196) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-11 (184.72.172.32 | 10.12.8.76) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-10 (34.229.81.25 | 10.12.8.113) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d0e85230-1 (52.23.238.92 | 10.12.11.66) (shards: 7)

OS / Image: ami-08b5f8ff1565ab9f0 (aws: undefined_region)

Test: longevity-schema-topology-changes-12h-test Test id: d0e85230-b857-4b85-af24-1de2d886a541 Test name: enterprise-2023.1/longevity/longevity-schema-topology-changes-12h-test Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor d0e85230-b857-4b85-af24-1de2d886a541
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs d0e85230-b857-4b85-af24-1de2d886a541

Logs:

Jenkins job URL Argus

juliayakovlev avatar Jan 15 '24 06:01 juliayakovlev

@avelanarius - any updates?

mykaul avatar Jan 30 '24 08:01 mykaul

@avelanarius maybe can be moved to @Bouncheck who deals with other java-drivers issues and can still make to Sprint2?

roydahan avatar Jan 30 '24 20:01 roydahan

happens again on 2023.1.5 run

Feb 14, 2024 4:37:32 PM com.google.common.util.concurrent.AggregateFuture log
SEVERE: Input Future failed with Error
java.lang.AssertionError
	at com.datastax.driver.core.ConvictionPolicy$DefaultConvictionPolicy.signalConnectionClosed(ConvictionPolicy.java:90)
	at com.datastax.driver.core.Connection.closeAsync(Connection.java:1095)
	at com.datastax.driver.core.ControlConnection.onHostGone(ControlConnection.java:1122)
	at com.datastax.driver.core.ControlConnection.onRemove(ControlConnection.java:1112)
	at com.datastax.driver.core.Cluster$Manager.onRemove(Cluster.java:2503)
	at com.datastax.driver.core.Cluster$Manager.access$1400(Cluster.java:1560)
	at com.datastax.driver.core.Cluster$Manager$NodeRefreshRequestDeliveryCallback$4.runMayThrow(Cluster.java:3235)
	at com.datastax.driver.core.ExceptionCatchingRunnable.run(ExceptionCatchingRunnable.java:32)
total,     254265150,   14095,   14095,   14095,     2.8,     1.4,     6.3,    28.6,   116.0,   195.8,17080.0,  0.04657,      0,      0,       0,       0,       0,       0
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at com.datastax.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)

Packages

Scylla version: 2023.1.5-20240213.08fd6aec7a43 with build-id 448979e99e198eeab4a3b0e1b929397d337d2724 Kernel Version: 5.15.0-1053-aws

Installation details

Cluster size: 5 nodes (i4i.2xlarge)

Scylla Nodes used in this run:

  • longevity-parallel-topology-schema--db-node-d6a1f069-9 (3.208.19.118 | 10.12.10.155) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-8 (3.88.99.101 | 10.12.8.219) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-7 (34.229.149.141 | 10.12.9.35) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-6 (54.174.66.96 | 10.12.11.24) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-5 (34.228.68.118 | 10.12.8.18) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-4 (34.207.148.169 | 10.12.10.43) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-3 (107.22.67.56 | 10.12.10.185) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-23 (34.230.91.117 | 10.12.11.38) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-22 (34.224.38.91 | 10.12.10.96) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-21 (54.226.154.223 | 10.12.8.155) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-20 (3.89.207.5 | 10.12.8.130) (shards: -1)
  • longevity-parallel-topology-schema--db-node-d6a1f069-2 (54.163.68.110 | 10.12.9.238) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-19 (34.204.9.138 | 10.12.8.69) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-18 (34.230.21.14 | 10.12.10.223) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-17 (54.157.211.8 | 10.12.11.188) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-16 (3.80.23.111 | 10.12.9.16) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-15 (54.237.198.134 | 10.12.9.30) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-14 (3.95.212.234 | 10.12.10.4) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-13 (34.229.85.228 | 10.12.10.243) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-12 (18.207.156.115 | 10.12.8.24) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-11 (54.226.191.147 | 10.12.11.170) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-10 (54.173.57.133 | 10.12.10.116) (shards: 7)
  • longevity-parallel-topology-schema--db-node-d6a1f069-1 (54.160.228.203 | 10.12.8.251) (shards: 7)

OS / Image: ami-07dcd58abd440d69d (aws: undefined_region)

Test: longevity-schema-topology-changes-12h-test Test id: d6a1f069-382b-4bec-9299-0d0dd507101e Test name: enterprise-2023.1/longevity/longevity-schema-topology-changes-12h-test Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor d6a1f069-382b-4bec-9299-0d0dd507101e
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs d6a1f069-382b-4bec-9299-0d0dd507101e

Logs:

Jenkins job URL Argus

fruch avatar Feb 19 '24 08:02 fruch

Reproduced

WARN  [cluster1-nio-worker-5] 2024-08-17 16:52:18,060 DefaultPromise.java:593 - An exception was thrown by com.datastax.driver.core.Connection$ChannelCloseListener.operationComplete()
java.lang.AssertionError: null
        at com.datastax.driver.core.ConvictionPolicy$DefaultConvictionPolicy.signalConnectionFailure(ConvictionPolicy.java:101)
        at com.datastax.driver.core.Connection.defunct(Connection.java:812)
        at com.datastax.driver.core.Connection$ChannelCloseListener.operationComplete(Connection.java:1667)
        at com.datastax.driver.core.Connection$ChannelCloseListener.operationComplete(Connection.java:1657)
        at com.datastax.shaded.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
        at com.datastax.shaded.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583)
        at com.datastax.shaded.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559)
        at com.datastax.shaded.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
        at com.datastax.shaded.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
        at com.datastax.shaded.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:625)
        at com.datastax.shaded.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:105)
        at com.datastax.shaded.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
        at com.datastax.shaded.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:1164)
        at com.datastax.shaded.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:755)
        at com.datastax.shaded.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:731)
        at com.datastax.shaded.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:620)
        at com.datastax.shaded.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.closeOnRead(AbstractNioByteChannel.java:105)
        at com.datastax.shaded.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:174)
        at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
        at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
        at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
        at com.datastax.shaded.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
        at com.datastax.shaded.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at com.datastax.shaded.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at com.datastax.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)

Packages

Scylla version: 6.2.0~dev-20240816.afee3924b3dc with build-id c01d2a55a9631178e3fbad3869c20ef3c8dcf293

Kernel Version: 6.8.0-1013-aws

Issue description

  • [ ] This issue is a regression.
  • [ ] It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 5 nodes (i4i.2xlarge)

Scylla Nodes used in this run:

  • longevity-parallel-topology-schema--db-node-38b8ee85-9 (34.243.172.21 | 10.4.10.40) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-8 (34.241.87.177 | 10.4.9.153) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-7 (3.248.146.218 | 10.4.11.3) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-65 (52.30.194.155 | 10.4.10.20) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-64 (54.75.203.188 | 10.4.10.126) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-63 (52.50.49.24 | 10.4.8.120) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-62 (18.203.32.147 | 10.4.9.93) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-61 (52.31.140.46 | 10.4.10.138) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-60 (54.73.163.60 | 10.4.11.187) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-6 (176.34.188.246 | 10.4.9.179) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-59 (63.33.142.5 | 10.4.11.147) (shards: -1)
  • longevity-parallel-topology-schema--db-node-38b8ee85-58 (99.80.53.93 | 10.4.9.97) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-57 (54.246.102.229 | 10.4.11.18) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-56 (54.246.114.254 | 10.4.9.231) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-55 (54.228.229.218 | 10.4.11.135) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-54 (54.228.128.54 | 10.4.8.54) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-53 (54.220.76.170 | 10.4.8.112) (shards: -1)
  • longevity-parallel-topology-schema--db-node-38b8ee85-52 (54.228.155.196 | 10.4.10.62) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-51 (54.155.42.214 | 10.4.10.130) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-50 (18.202.172.255 | 10.4.9.84) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-5 (54.194.167.67 | 10.4.9.98) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-49 (34.253.77.130 | 10.4.8.112) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-48 (52.50.119.131 | 10.4.10.10) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-47 (54.247.107.226 | 10.4.10.233) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-46 (63.34.140.249 | 10.4.10.89) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-45 (52.208.115.57 | 10.4.8.139) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-44 (108.129.36.151 | 10.4.9.85) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-43 (108.129.54.0 | 10.4.9.8) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-42 (46.51.181.179 | 10.4.11.99) (shards: -1)
  • longevity-parallel-topology-schema--db-node-38b8ee85-41 (52.49.169.211 | 10.4.10.106) (shards: -1)
  • longevity-parallel-topology-schema--db-node-38b8ee85-40 (52.51.11.85 | 10.4.8.32) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-4 (79.125.120.229 | 10.4.10.190) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-39 (54.73.81.149 | 10.4.10.154) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-38 (52.212.235.107 | 10.4.8.34) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-37 (54.78.206.207 | 10.4.10.23) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-36 (34.249.82.167 | 10.4.10.253) (shards: -1)
  • longevity-parallel-topology-schema--db-node-38b8ee85-35 (54.74.152.7 | 10.4.9.250) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-34 (52.16.207.166 | 10.4.8.26) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-33 (108.128.171.193 | 10.4.8.255) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-32 (54.217.75.3 | 10.4.9.97) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-31 (52.51.9.38 | 10.4.10.234) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-30 (54.74.38.195 | 10.4.8.201) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-3 (176.34.236.69 | 10.4.11.232) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-29 (34.243.65.171 | 10.4.9.35) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-28 (18.200.83.15 | 10.4.8.114) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-27 (54.217.206.13 | 10.4.8.32) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-26 (54.194.3.157 | 10.4.9.254) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-25 (46.137.121.193 | 10.4.8.33) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-24 (63.34.81.137 | 10.4.10.200) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-23 (52.48.243.97 | 10.4.10.153) (shards: -1)
  • longevity-parallel-topology-schema--db-node-38b8ee85-22 (79.125.122.56 | 10.4.8.30) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-21 (54.220.98.27 | 10.4.10.169) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-20 (52.19.173.79 | 10.4.10.146) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-2 (34.240.255.213 | 10.4.11.26) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-19 (34.240.190.131 | 10.4.9.22) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-18 (54.247.95.169 | 10.4.9.82) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-17 (34.251.173.31 | 10.4.9.17) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-16 (34.252.201.75 | 10.4.10.157) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-15 (52.211.65.141 | 10.4.9.175) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-14 (34.241.10.58 | 10.4.8.146) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-13 (52.30.1.91 | 10.4.11.144) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-12 (52.16.190.194 | 10.4.9.176) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-11 (52.211.210.182 | 10.4.11.78) (shards: -1)
  • longevity-parallel-topology-schema--db-node-38b8ee85-10 (52.211.160.16 | 10.4.10.201) (shards: 7)
  • longevity-parallel-topology-schema--db-node-38b8ee85-1 (46.137.180.83 | 10.4.9.26) (shards: 7)

OS / Image: ami-0f440d7175113787f (aws: undefined_region)

Test: longevity-schema-topology-changes-12h-test Test id: 38b8ee85-76c8-46ba-8f22-7964bec999fa Test name: scylla-master/tier1/longevity-schema-topology-changes-12h-test Test method: longevity_test.LongevityTest.test_custom_time Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 38b8ee85-76c8-46ba-8f22-7964bec999fa
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 38b8ee85-76c8-46ba-8f22-7964bec999fa

Logs:

Jenkins job URL Argus

juliayakovlev avatar Aug 19 '24 13:08 juliayakovlev

I was also able to reproduce it. Since @juliayakovlev wrote an extensive comment with all the details about this issue, I won't repeat it.

grzywin avatar Aug 21 '24 12:08 grzywin