yugabyte-db
yugabyte-db copied to clipboard
DocOperationTest.EarlyFilesFilteredBeforeBigFile failing occasionally on TSAN
Jira Link: DB-773 Test is failing every so often, only in TSAN environments. May be related to a race condition or timing issue. https://detective-gcp.dev.yugabyte.com/stability/test?branch=master&build_type=all&class=DocOperationTest&name=EarlyFilesFilteredBeforeBigFile&platform=centos
@jmeehan16 This is still occurring in 2.8.10.0-b2:
==================
WARNING: ThreadSanitizer: data race (pid=24457)
Write of size 1 at 0x7b5400010daa by thread T3 (mutexes: write M1004156980073336112):
#0 rocksdb::CompactionPicker::MarkL0FilesForDeletion(rocksdb::VersionStorageInfo const*, rocksdb::ImmutableCFOptions const*) /nfusr/centos-gcp-cloud/jenkins-worker-vfbr5n/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-459/yugabyte-db/build/tsan-clang7-dynamic-ninja/../../src/yb/rocksdb/db/compaction_picker.cc:481:34 (librocksdb.so+0x2e37f8)
#1 rocksdb::UniversalCompactionPicker::CalculateSortedRuns(rocksdb::VersionStorageInfo const&, rocksdb::ImmutableCFOptions const&, unsigned long) /nfusr/centos-gcp-cloud/jenkins-worker-vfbr5n/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-459/yugabyte-db/build/tsan-clang7-dynamic-ninja/../../src/yb/rocksdb/db/compaction_picker.cc:1268:3 (librocksdb.so+0x2e797c)
--
#26 RUN_ALL_TESTS() /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20210916160346-44ba371965-centos7-x86_64-clang7/installed/tsan/include/gtest/gtest.h:2233:46 (libyb_test_main.so+0x528b)
#27 main /nfusr/centos-gcp-cloud/jenkins-worker-vfbr5n/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-459/yugabyte-db/build/tsan-clang7-dynamic-ninja/../../src/yb/util/test_main.cc:104:13 (libyb_test_main.so+0x4e6d)
SUMMARY: ThreadSanitizer: data race /nfusr/centos-gcp-cloud/jenkins-worker-vfbr5n/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-459/yugabyte-db/build/tsan-clang7-dynamic-ninja/../../src/yb/rocksdb/db/compaction_picker.cc:481:34 in rocksdb::CompactionPicker::MarkL0FilesForDeletion(rocksdb::VersionStorageInfo const*, rocksdb::ImmutableCFOptions const*)
==================
Should the fix be backported? If the fix is low risk I'd argue for it so we have greener test runs and can better find actual new problems.
@jmeehan16 do you recall the fix here? maybe it was done in a different diff? would be good to backport, especially if it's potentially a product race
@bmatican It seems that this was fixed by @spolitov diff to Switch TSAN build to use clang12 in master. This is in 2.14, but has not been backported beyond that.
I agree that it might be good to backport, though it may be a messy one given that it touches a good deal of code. The feature/test in question exists as far back as 2.6.
Alternatively, the fix for this particular test failure is fairly small. If it's preferable, we could carve out a fix independent of the full diff.
@jmeehan16 agree, would be unlikely we'd port the clang12 change, but to clarify, is this a test-only issue, or a genuine product race?
@bmatican Unfortunately this is a genuine product race, so we should definitely do at least a partial backport here.
Thanks @jmeehan16 , then if you could cherrypick just the product fix, that'd be great!
@bmatican D19583 backports the relevant code to 2.12. I'll have @spolitov review it once he returns, and we can put it on 2.8 and 2.6 from there.