yugabyte-db icon indicating copy to clipboard operation
yugabyte-db copied to clipboard

DocOperationTest.EarlyFilesFilteredBeforeBigFile failing occasionally on TSAN

Open jmeehan16 opened this issue 3 years ago • 7 comments

Jira Link: DB-773 Test is failing every so often, only in TSAN environments. May be related to a race condition or timing issue. https://detective-gcp.dev.yugabyte.com/stability/test?branch=master&build_type=all&class=DocOperationTest&name=EarlyFilesFilteredBeforeBigFile&platform=centos

jmeehan16 avatar Feb 18 '22 00:02 jmeehan16

@jmeehan16 This is still occurring in 2.8.10.0-b2:

==================
WARNING: ThreadSanitizer: data race (pid=24457)
  Write of size 1 at 0x7b5400010daa by thread T3 (mutexes: write M1004156980073336112):
    #0 rocksdb::CompactionPicker::MarkL0FilesForDeletion(rocksdb::VersionStorageInfo const*, rocksdb::ImmutableCFOptions const*) /nfusr/centos-gcp-cloud/jenkins-worker-vfbr5n/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-459/yugabyte-db/build/tsan-clang7-dynamic-ninja/../../src/yb/rocksdb/db/compaction_picker.cc:481:34 (librocksdb.so+0x2e37f8)
    #1 rocksdb::UniversalCompactionPicker::CalculateSortedRuns(rocksdb::VersionStorageInfo const&, rocksdb::ImmutableCFOptions const&, unsigned long) /nfusr/centos-gcp-cloud/jenkins-worker-vfbr5n/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-459/yugabyte-db/build/tsan-clang7-dynamic-ninja/../../src/yb/rocksdb/db/compaction_picker.cc:1268:3 (librocksdb.so+0x2e797c)
--
    #26 RUN_ALL_TESTS() /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20210916160346-44ba371965-centos7-x86_64-clang7/installed/tsan/include/gtest/gtest.h:2233:46 (libyb_test_main.so+0x528b)
    #27 main /nfusr/centos-gcp-cloud/jenkins-worker-vfbr5n/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-459/yugabyte-db/build/tsan-clang7-dynamic-ninja/../../src/yb/util/test_main.cc:104:13 (libyb_test_main.so+0x4e6d)

SUMMARY: ThreadSanitizer: data race /nfusr/centos-gcp-cloud/jenkins-worker-vfbr5n/jenkins/jenkins-github-yugabyte-db-centos-master-clang7-tsan-459/yugabyte-db/build/tsan-clang7-dynamic-ninja/../../src/yb/rocksdb/db/compaction_picker.cc:481:34 in rocksdb::CompactionPicker::MarkL0FilesForDeletion(rocksdb::VersionStorageInfo const*, rocksdb::ImmutableCFOptions const*)
==================

Should the fix be backported? If the fix is low risk I'd argue for it so we have greener test runs and can better find actual new problems.

def- avatar Sep 09 '22 11:09 def-

@jmeehan16 do you recall the fix here? maybe it was done in a different diff? would be good to backport, especially if it's potentially a product race

bmatican avatar Sep 09 '22 14:09 bmatican

@bmatican It seems that this was fixed by @spolitov diff to Switch TSAN build to use clang12 in master. This is in 2.14, but has not been backported beyond that.

I agree that it might be good to backport, though it may be a messy one given that it touches a good deal of code. The feature/test in question exists as far back as 2.6.

Alternatively, the fix for this particular test failure is fairly small. If it's preferable, we could carve out a fix independent of the full diff.

jmeehan16 avatar Sep 15 '22 02:09 jmeehan16

@jmeehan16 agree, would be unlikely we'd port the clang12 change, but to clarify, is this a test-only issue, or a genuine product race?

bmatican avatar Sep 15 '22 14:09 bmatican

@bmatican Unfortunately this is a genuine product race, so we should definitely do at least a partial backport here.

jmeehan16 avatar Sep 15 '22 16:09 jmeehan16

Thanks @jmeehan16 , then if you could cherrypick just the product fix, that'd be great!

bmatican avatar Sep 15 '22 16:09 bmatican

@bmatican D19583 backports the relevant code to 2.12. I'll have @spolitov review it once he returns, and we can put it on 2.8 and 2.6 from there.

jmeehan16 avatar Sep 15 '22 21:09 jmeehan16