Botond Dénes

Results 853 comments of Botond Dénes

> [@xemul](https://github.com/xemul) worked long time to improve the initialization and although it became better it is not ideal and degraded with time even more. The thing is, a lot of...

Also seen in https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/13778/testReport/junit/topology_random_failures/test_random_failures/Tests___Unit_Tests___test_random_failures_stop_before_streaming_restart_coordinator_node__debug_3.

Seen again in https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/16144/testReport/junit/cluster.random_failures/test_random_failures/Tests___Unit_Tests___test_random_failures_stop_after_sending_join_node_request_add_new_node__debug_1/

> < t:2025-02-25 10:49:27,531 f:tester.py l:3715 c:TombstoneGcLongevityTest p:DEBUG > Email data: {'backend': 'aws', 'build_id': '40', 'job_url': 'https://jenkins.scylladb.com/job/scylla-staging/job/yarongilor/job/byo-longevity-test-yg2/40/', 'end_time': '2025-02-25 10:47:15', 'events_summary': {'NORMAL': 49, 'WARNING': 19, 'ERROR': 1}, 'last_events': {'CRITICAL': [],...

@yarongilor we now have two different nodetool commands for repair: * nodetool repair - for vnode keyspaces; cannot be used for tablet keyspaces * nodetool cluster repair - for tablet...

> > Tested in: https://jenkins.scylladb.com/job/scylla-staging/job/yarongilor/job/disk-to-ram-ratio-minimal-memory-size-test/ > > [Failed](https://jenkins.scylladb.com/job/scylla-staging/job/yarongilor/job/disk-to-ram-ratio-minimal-memory-size-test/4/) with: > > ``` > !ERR | scylla[6668]: [shard 0:main] init - Startup failed: std::runtime_error (configuration (memory per shard too low)) >...

Thanks @yarongilor I think this will serve as a good base-line for validating our improvements. Stopping the crash-loop and making sure ScyllaDB stays available will be a good first step.

> [@mykaul](https://github.com/mykaul) [@bhalevy](https://github.com/bhalevy) In addition to this coredump, [#17039](https://github.com/scylladb/scylladb/issues/17039) is now bottlenecked by streaming performance. Streaming takes an inordinate amount of time even for an empty cluster. Do you use...

The cores are in a weird spot where they appear to have TLS working but `seastar::local_engine = 0x0`. Debugging is very challenging this way. Vast majority of our helper commands...

To continue this investigation, I need a rerun which produces usable coredumps. Hopefully whatever caused the cores to be bad was transient.