scylladb icon indicating copy to clipboard operation
scylladb copied to clipboard

test.py doesn't imediatelly close on SIGINT (ctrl+c)

Open nuivall opened this issue 1 year ago • 6 comments


^C
Shutdown requested... Aborting tests:
...done
...done
test_maintenance_socket test_raft_service_levels test_auth_no_quorum test_auth_raft_command_split test_auth_v2_migration logalloc_test ...done
database_test.test_safety_after_truncate ...done
database_test.test_querying_with_limits database_test.test_database_with_data_in_sstables_is_a_mutation_source_plain_basic_cg1 database_test.test_database_with_data_in_sstables_is_a_mutation_source_plain_basic_cg0 database_test.test_database_with_data_in_sstables_is_a_mutation_source_plain_fragments_monotonic_cg1 database_test.test_database_with_data_in_sstables_is_a_mutation_source_plain_reader_conversion_cg1 database_test.test_database_with_data_in_sstables_is_a_mutation_source_plain_read_back_cg1 database_test.test_truncate_without_snapshot_during_writes database_test.test_database_with_data_in_sstables_is_a_mutation_source_reverse_basic_cg0 database_test.test_database_with_data_in_sstables_is_a_mutation_source_reverse_reader_conversion_cg0 database_test.test_database_with_data_in_sstables_is_a_mutation_source_reverse_fragments_monotonic_cg0 database_test.test_database_with_data_in_sstables_is_a_mutation_source_reverse_read_back_cg0 database_test.test_database_with_data_in_sstables_is_a_mutation_source_plain_fragments_monotonic_cg0 database_test.test_database_with_data_in_sstables_is_a_mutation_source_plain_reader_conversion_cg0 database_test.test_database_with_data_in_sstables_is_a_mutation_source_plain_read_back_cg0 ...done
database_test.test_database_with_data_in_sstables_is_a_mutation_source_reverse_basic_cg1 ...done
database_test.snapshot_list_inexistent database_test.test_database_with_data_in_sstables_is_a_mut
(...)
flush_queue_test.test_queue_ordering_multi_ops ...done
^C^C^C^C^C^C^C^C^C

flush_queue_test.test_propagate_exception_in_op flush_queue_test.test_propagate_exception_in_post flush_queue_test.test_no_propagate_exception_in_op flush_queue_test.test_no_propagate_exception_in_post fragmented_temporary_buffer_test.test_read_to fragmented_temporary_buffer_test.test_empty_istream fragmented_temporary_buffer_test.test_read_view fragmented_temporary_buffer_test.test_view fragmented_temporary_buffer_test.test_view_equality fragmented_temporary_buffer_test.test_read_pod fragmented_temporary_buffer_test.test_read_bytes_view fragmented_temporary_buffer_test.test_skip fragmented_temporary_buffer_test.test_remove_suffix fragmented_temporary_buffer_test.test_read_fragmented_buffer ...done
^C^C^C^C

looks like some work continue and Shutdown requested... Aborting tests is not fully working.

nuivall avatar Nov 19 '24 13:11 nuivall

In my opinion, it should be assumed that SIGINT (and also SIGTERM) is for interactive developer use, and should do interactive users expect to happen - the tests should stop immediately (or nearly immediately). It shouldn't do much more than print the names of tests that were killed. It shouldn't bother to cleanly "shut down" tests - it should just kill them all with SIGKILL.

By the way, this isn't relevant to what test.py does (I don't know what test.py does), but I have to admit that cql/run.py also doesn't do exactly what I suggested above.. It tries to kill with SIGTERM up to 10 seconds before falling back to SIGKILL. It used to use SIGKILL, but then in d2ca600eec2769222a6ab16bad6ca74dd06faf2e it was changed to this SIGTERM-with-timeout behavior. While it made sense for ordinary shutdown, I admit it isn't a great idea for control-C. Especially when a test hangs because of a Scylla bug, and you want to interrupt it, and Scylla can't shut down cleanly because it's hung.

nyh avatar Nov 19 '24 13:11 nyh

It's a regression, it used to work.

kostja avatar Nov 22 '24 18:11 kostja

@nuivall please provide the information about how you are launching them.

xtrey avatar Nov 26 '24 20:11 xtrey

Via test.py

nuivall avatar Nov 27 '24 08:11 nuivall

Just saw the same problem. Ran test.py (I tried both from "ninja dev-test" and directly), pressed control-C, saw

Shutdown requested... Aborting tests:

But then started to get an endless list of test names ...done, with a few seconds between each one. I thought to myself maybe, at worst, it will still wait for a few last tests before exiting, but it really looked like it was going through all the tests and never finishing.

I expect control-C to work immediately, or at worst in a few seconds. I should not wait for many tests to finish gracefully, and most certainly should continue to start all the tests in the system.

nyh avatar Dec 18 '24 22:12 nyh

@xtrey it hurts developers, could you find some time to find the root cause?

kostja avatar Dec 19 '24 13:12 kostja

issue fixed in https://github.com/scylladb/scylladb/pull/22069

temichus avatar Mar 03 '25 10:03 temichus