yugabyte-db
yugabyte-db copied to clipboard
[DocDB] Fix race that allows duplicate snapshot schedules
Jira Link: DB-3151
Description
Commit 8393a7c81b89cb0c668315205517c68708d17870 introduced a check in CreateSchedule
to fail a snapshot schedule creation request if a schedule already exists in the same keyspace. This holds a lock while reading schedules_
but must release the lock before invoking SynchronizedWrite
to avoid deadlock.
{
std::lock_guard<std::mutex> lock(mutex_);
const auto& existing_schedule = FindSnapshotSchedule(namespace_name, namespace_type);
...
}
...
RETURN_NOT_OK(SynchronizedWrite(std::move(write_batch), leader_term, deadline, &context_));
We have the following race if concurrent requests attempt to create a snapshot in the same keyspace:
Thread-1: Takes the mutex_, verifies that a schedule does not exist for ysql.joe. Releases the mutex_.
Thread-2: Takes the mutex_, verifies that a schedule does not exist for ysql.joe. Releases the mutex_.
Thread-1: Invokes SynchronizedWrite()
Thread-2: Invokes SynchronizedWrite()
One possible fix would be to introduce a pending_keyspaces_
structure to track snapshot schedules that we will write.
cc @sanketkedia