Petr Gusev
Petr Gusev
> I believe abort_source is a mistake. There end up too many sources of events: timeouts, aborts, and the original events needed by the application. We have to navigate the...
* reproduces steadily locally with `--count 25` if we comment out `time.sleep(0.01)`; * `FlakyRetryPolicy` is not applied, since the timeout in client side. Retry policies are only applied to handle...
`ERROR 2024-03-01 15:59:14,423 [shard 0:stmt] group0_raft_sm - group0_state_machine::transfer_snapshot, merge_topology_snapshot CRASH std::invalid_argument (Mutation of 17698243 bytes is too large for the maximum size of 16777216), topo_desc {1, 408961}, cdc_gen_desc {1, 17280322},...
> @gusev-p your assessment looks correct to me. Bug in topology snapshot transfer / merging. How should we fix that? `segment_manager::allocate_when_possible` requires that the total size of all mutations, passed...
This is another problem, will look at it
> > > Isn't cdc_generations_v3 a raft table? Shouldn't it be using the schema commitlog? > > > > > > It is and it is > > Then why...
> We can fail write on one shard with error injection and then read the state back and see if it is the latest one. fail write on one shard...
new version: * restructured the commits -- make erm_handle optional in cas in one commit, update call sites in separate ones, make it non-optional in the last commit * added...
new version: * rename erm_handle -> token_metadata_guard * introduce cas_shard class * pass cas_shard parameter through pagers, since LWT reads in select_statements sometimes go through `execute_without_checking_exception_message_aggregate_or_paged`