Botond Dénes

Results 853 comments of Botond Dénes

`stop_transport()`: ``` 2824 future storage_service::stop_transport() { 1 if (!_transport_stopped.has_value()) { 2 promise stopped; 3 _transport_stopped = stopped.get_future(); 4 5 seastar::async([this] { 6 slogger.info("Stop transport: starts"); 7 8 slogger.debug("shutting down migration...

Default stop timeout is 127 seconds, we can see this in action: ``` 55 13:35:29,207 706 ccm WARNING cluster.py :760 | test_repair_kill_3: node1: node1 is still running after 127 seconds....

Finally stumbled on the stuck fiber: ``` (gdb) scylla fiber 0x600004365080 [shard 1] #-7 (task*) 0x000060100466cb40 0x0000000002dcbff0 service::migration_manager::drain() [clone .resume] [shard 1] #-6 (task*) 0x000060100035cbc0 0x00000000006a9140 vtable for seastar::continuation +...

And here is the rest of the fiber: ``` (gdb) scylla fiber 0x601004cbc730 [shard 1] #-1 (task*) 0x0000601004d3a100 0x0000000000554f18 vtable for seastar::continuation + 16 [shard 1] #0 (task*) 0x0000601004cbc730 0x000000000059ced8...

Both `migration_manager` instances, on both shards, have `0` count instances on the `_group0_barrier`'s semaphore: ``` (gdb) p $14->_group0_barrier._sem._count $21 = 0 (gdb) p $5->_group0_barrier._sem._count $22 = 0 ``` This means...

I do recall another instance where we've seen this in the past: shutdown stuck forever because group0 lost quorum (unfortunately I don't remember the exact issue). @kbr-scylla do you have...

> I think it _should_ be able to follow through coroutines. Is that really what happened here, that it wasn't able to? It looks like so, see the trace at...

> Raft group operations have a timeout now. I see that the one in drain suppose to run with default timeout of one minute. After the timeout we check it...

> If it was [36b57f3](https://github.com/scylladb/scylladb/commit/36b57f3432cb34e20df27a7bc90ff16b4ecb043f) from this post: [#19244 (comment)](https://github.com/scylladb/scylladb/issues/19244#issuecomment-2241657247), then that's before the fix [c05e077](https://github.com/scylladb/scylladb/commit/c05e077a1365f9c20a1ab060ff65b000f4fe3176) was merged. > > I think we can close as duplicate of #19223, but...

> And the query we see here uses `LIMIT 1` so perhaps it's simply hitting an empty page, therefore not seeing the row which would show up at one of...