Driver reported "[Errno 9] Bad file descriptor"
Scylla version: 2026.1.0~dev-20251205.866c96f536b0 with build-id 2c38506085b888e1baa43f81d05dab12df5132c1
During latest master runs driver reported following error:
< t:2025-12-06 04:21:39,017 f:cluster.py l:3723 c:cassandra.cluster p:WARNING > [control connection] Error connecting to 10.12.33.86:9042: < t:2025-12-06 04:21:39,017 f:cluster.py l:3723 c:cassandra.cluster p:WARNING > [control connection] Error connecting to 10.12.33.86:9042:
< t:2025-12-06 04:21:39,017 f:cluster.py l:3723 c:cassandra.cluster p:WARNING > Traceback (most recent call last):
< t:2025-12-06 04:21:39,017 f:cluster.py l:3723 c:cassandra.cluster p:WARNING > File "cassandra/cluster.py", line 3546, in cassandra.cluster.ControlConnection._connect_host_in_lbp
< t:2025-12-06 04:21:39,017 f:cluster.py l:3723 c:cassandra.cluster p:WARNING > File "cassandra/cluster.py", line 3662, in cassandra.cluster.ControlConnection._try_connect
< t:2025-12-06 04:21:39,017 f:cluster.py l:3723 c:cassandra.cluster p:WARNING > File "cassandra/cluster.py", line 3646, in cassandra.cluster.ControlConnection._try_connect
< t:2025-12-06 04:21:39,017 f:cluster.py l:3723 c:cassandra.cluster p:WARNING > cassandra.connection.ConnectionShutdown: [Errno 9] Bad file descriptor
< t:2025-12-06 04:21:39,017 f:cluster.py l:3723 c:cassandra.cluster p:WARNING > Host 10.12.33.86:9042 has been marked down
It seems that such errors appeared each time while one of nodes been down
Also been spotted there: https://argus.scylladb.com/tests/scylla-cluster-tests/a8cd6873-19c1-49c1-ab5a-dca25655ed6c
Kernel Version: 6.14.0-1017-aws
Extra information
Installation details
Cluster size: 6 nodes (i7i.4xlarge)
Scylla Nodes used in this run:
- longevity-tls-50gb-3d-master-db-node-38f90182-1 (3.228.203.95 | 10.12.35.225) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-38f90182-2 (44.213.201.240 | 10.12.32.118) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-38f90182-3 (100.30.78.169 | 10.12.33.86) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-38f90182-4 (44.207.141.103 | 10.12.35.59) (shards: -1)
- longevity-tls-50gb-3d-master-db-node-38f90182-5 (3.219.68.68 | 10.12.35.232) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-38f90182-6 (34.199.164.159 | 10.12.33.159) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-38f90182-7 (98.82.213.102 | 10.12.33.203) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-38f90182-8 (98.83.182.28 | 10.12.34.200) (shards: 14)
OS / Image: ami-0810c73586fe68036 (aws: N/A)
Test: longevity-50gb-3days-test
Test id: 38f90182-547d-4b60-973c-7e826b926708
Test name: scylla-master/tier1/longevity-50gb-3days-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):
Logs:
Hi @dkropachev , could you please take a look at this issue?
reproduced again:
Scylla version: 2026.1.0~dev-20251211.f7ffa395a8fd with build-id 6ed9dbb170d6894329ed88a93e118dd68cbd62a9
Kernel Version: 6.14.0-1018-aws
Extra information
Installation details
Cluster size: 6 nodes (i7i.2xlarge)
Scylla Nodes used in this run:
- longevity-50gb-12h-master-db-node-70283809-1 (13.218.127.161 | 10.12.9.87) (shards: 4)
- longevity-50gb-12h-master-db-node-70283809-2 (54.91.187.43 | 10.12.8.66) (shards: 4)
- longevity-50gb-12h-master-db-node-70283809-3 (18.212.86.250 | 10.12.8.228) (shards: 6)
- longevity-50gb-12h-master-db-node-70283809-4 (54.92.211.214 | 10.12.10.198) (shards: 6)
- longevity-50gb-12h-master-db-node-70283809-5 (98.84.134.241 | 10.12.8.104) (shards: 5)
- longevity-50gb-12h-master-db-node-70283809-6 (54.160.211.56 | 10.12.9.171) (shards: 4)
OS / Image: ami-02ad235f4c4336f6c (aws: N/A)
Test: longevity-150gb-asymmetric-cluster-12h-test
Test id: 70283809-37aa-4be5-9ebc-d891e1a2d6aa
Test name: scylla-master/tier1/longevity-150gb-asymmetric-cluster-12h-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):
Logs:
Scylla version: 2026.1.0~dev-20251219.f65db4e8eba5 with build-id 683ff5b7a4a313ea6094e72fd639c906693ece37
Kernel Version: 6.14.0-1018-aws
Extra information
Installation details
Cluster size: 6 nodes (i7i.4xlarge)
Scylla Nodes used in this run:
- longevity-tls-50gb-3d-master-db-node-c6beb17a-1 (98.87.193.30 | 10.12.35.220) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-c6beb17a-2 (52.6.69.201 | 10.12.34.173) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-c6beb17a-3 (100.49.143.61 | 10.12.32.22) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-c6beb17a-4 (52.203.20.179 | 10.12.34.56) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-c6beb17a-5 (44.193.182.223 | 10.12.32.49) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-c6beb17a-6 (50.17.245.62 | 10.12.34.86) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-c6beb17a-7 (44.209.62.120 | 10.12.32.166) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-c6beb17a-8 (54.152.201.38 | 10.12.33.224) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-c6beb17a-9 (98.95.22.145 | 10.12.33.230) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-c6beb17a-10 (3.231.75.179 | 10.12.32.60) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-c6beb17a-11 (100.49.20.125 | 10.12.33.136) (shards: -1)
- longevity-tls-50gb-3d-master-db-node-c6beb17a-12 (3.215.138.198 | 10.12.34.73) (shards: 14)
OS / Image: ami-048249cf3c5bfc84f (aws: N/A)
Test: longevity-50gb-3days-test
Test id: c6beb17a-d0b9-43b6-ad05-2fbd45c4201d
Test name: scylla-master/tier1/longevity-50gb-3days-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):
Logs:
Scylla version: 2026.1.0~dev-20251219.f65db4e8eba5 with build-id 683ff5b7a4a313ea6094e72fd639c906693ece37
Kernel Version: 6.14.0-1018-aws
Extra information
Installation details
Cluster size: 6 nodes (i7i.2xlarge)
Scylla Nodes used in this run:
- longevity-50gb-12h-master-db-node-e2d8a05c-1 (34.201.94.66 | 10.12.8.210) (shards: 6)
- longevity-50gb-12h-master-db-node-e2d8a05c-2 (18.234.51.11 | 10.12.8.248) (shards: 6)
- longevity-50gb-12h-master-db-node-e2d8a05c-3 (52.54.112.48 | 10.12.10.136) (shards: 5)
- longevity-50gb-12h-master-db-node-e2d8a05c-4 (13.222.190.127 | 10.12.11.124) (shards: 7)
- longevity-50gb-12h-master-db-node-e2d8a05c-5 (54.145.225.121 | 10.12.10.222) (shards: 7)
- longevity-50gb-12h-master-db-node-e2d8a05c-6 (18.208.221.26 | 10.12.8.44) (shards: 4)
OS / Image: ami-048249cf3c5bfc84f (aws: N/A)
Test: longevity-150gb-asymmetric-cluster-12h-test
Test id: e2d8a05c-55b0-4025-b3bb-00712401b844
Test name: scylla-master/tier1/longevity-150gb-asymmetric-cluster-12h-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):
Logs:
https://argus.scylladb.com/tests/scylla-cluster-tests/ccf51876-4d31-48dd-b266-0b83cca6c8fb https://argus.scylladb.com/tests/scylla-cluster-tests/85780def-b7ac-406a-b801-6608dab8a5d3 reproduced
The following is happening:
- Force close connection by some reason
- In parallel to that other parts of the driver either read from the connection or write to it, since socket got closed any operation on it ends up in
Bad file descriptor.
Unforrtunately the way driver handles this case make initial reason driver closed connection parish in time we can pick it up only from logs. So, either we pick it up from the logs or we need to add some code to persist reason why connection was closed and throw a proper message when socket operation failed.
The following is happening:
- Force close connection by some reason
- In parallel to that other parts of the driver either read from the connection or write to it, since socket got closed any operation on it ends up in
Bad file descriptor.Unforrtunately the way driver handles this case make initial reason driver closed connection parish in time we can pick it up only from logs. So, either we pick it up from the logs or we need to add some code to persist reason why connection was closed and throw a proper message when socket operation failed.
what is surfacing it now, it's doesn't sounds like a new flow in the driver ? python 3.14 ?
The following is happening:
- Force close connection by some reason
- In parallel to that other parts of the driver either read from the connection or write to it, since socket got closed any operation on it ends up in
Bad file descriptor.Unforrtunately the way driver handles this case make initial reason driver closed connection parish in time we can pick it up only from logs. So, either we pick it up from the logs or we need to add some code to persist reason why connection was closed and throw a proper message when socket operation failed.
what is surfacing it now, it's doesn't sounds like a new flow in the driver ? python 3.14 ?
Absolutely not, I don't think that it is a python 3.14 issue, we need to dig into it to come up with decent clues.
Reproduced, during disrupt_serial_restart_elected_topology_coordinator and disrupt_kill_mv_building_coordinator
Scylla version: 2026.1.0~dev-20260101.6c8ddfc018df with build-id a6c13b1f1c32f12209df2d88746d46ad87d6a234
Kernel Version: 6.14.0-1018-aws
Extra information
Installation details
Cluster size: 6 nodes (i7i.4xlarge)
Scylla Nodes used in this run:
- longevity-tls-50gb-3d-master-db-node-ebcdbea0-1 (3.94.50.66 | 10.12.34.81) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-ebcdbea0-2 (98.80.16.221 | 10.12.32.59) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-ebcdbea0-3 (44.209.1.35 | 10.12.35.218) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-ebcdbea0-4 (100.28.64.36 | 10.12.34.248) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-ebcdbea0-5 (100.50.211.251 | 10.12.34.30) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-ebcdbea0-6 (52.200.42.48 | 10.12.35.28) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-ebcdbea0-7 (34.225.99.38 | 10.12.32.159) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-ebcdbea0-8 (3.222.145.143 | 10.12.32.146) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-ebcdbea0-9 (98.89.139.56 | 10.12.35.173) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-ebcdbea0-10 (52.86.61.173 | 10.12.34.55) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-ebcdbea0-11 (3.219.7.100 | 10.12.35.29) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-ebcdbea0-12 (54.235.70.200 | 10.12.32.235) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-ebcdbea0-13 (18.211.116.191 | 10.12.35.96) (shards: -1)
- longevity-tls-50gb-3d-master-db-node-ebcdbea0-14 (100.30.149.227 | 10.12.34.14) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-ebcdbea0-15 (184.73.237.224 | 10.12.35.88) (shards: 14)
- longevity-tls-50gb-3d-master-db-node-ebcdbea0-16 (54.164.126.255 | 10.12.35.31) (shards: 14)
OS / Image: ami-06471fb71c6e86b19 (aws: N/A)
Test: longevity-50gb-3days-test
Test id: ebcdbea0-bc81-4521-b136-57391821385d
Test name: scylla-master/tier1/longevity-50gb-3days-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):
Logs:
Scylla version: 2026.1.0~rc0-20260125.f94296e0ae43 with build-id 9680213fda6f301234c43da8ca27e47953987cd8
Kernel Version: 6.14.0-1018-aws
Extra information
Installation details
Cluster size: 6 nodes (i4i.4xlarge)
Scylla Nodes used in this run:
- longevity-100gb-4h-2026-1-db-node-ae55afa4-1 (18.214.100.191 | 10.12.8.254) (shards: 14)
- longevity-100gb-4h-2026-1-db-node-ae55afa4-2 (98.93.132.101 | 10.12.11.251) (shards: 14)
- longevity-100gb-4h-2026-1-db-node-ae55afa4-3 (100.31.91.66 | 10.12.9.119) (shards: 14)
- longevity-100gb-4h-2026-1-db-node-ae55afa4-4 (54.196.137.234 | 10.12.8.236) (shards: 14)
- longevity-100gb-4h-2026-1-db-node-ae55afa4-5 (54.90.78.63 | 10.12.10.173) (shards: 14)
- longevity-100gb-4h-2026-1-db-node-ae55afa4-6 (13.220.180.103 | 10.12.10.163) (shards: 14)
- longevity-100gb-4h-2026-1-db-node-ae55afa4-7 (34.224.86.8 | 10.12.8.121) (shards: 14)
- longevity-100gb-4h-2026-1-db-node-ae55afa4-8 (54.242.91.143 | 10.12.11.54) (shards: -1)
OS / Image: ami-041ecb6271ecc1499 (aws: N/A)
Test: longevity-100gb-4h-test
Test id: ae55afa4-98cc-434a-8cfb-5d7738aba978
Test name: scylla-2026.1/longevity/longevity-100gb-4h-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):
Logs:
@roydahan / @dkropachev please assign