peloton icon indicating copy to clipboard operation
peloton copied to clipboard

SerializableTransactionTests.ReadOnlyTransactionTest failing intermittently

Open tli2 opened this issue 7 years ago • 8 comments

Several complaints about this failing from time to time on Travis. ~~Might be a Mac issue~~, might be that the test is broken, might be that we broke something in CC.

Investigation would be appreciated.

tli2 avatar Jun 25 '18 16:06 tli2

Here's an example of how it fails, and on rerun it typically passes. Maybe some nondeterminism in how stuff is being scheduled:

        Start  46: serializable_transaction_test



46: Test command: /Users/travis/build/cmu-db/peloton/build/test/serializable_transaction_test "--gtest_color=yes" "--gtest_output=xml:/Users/travis/build/cmu-db/peloton/build/test/serializable_transaction_test.xml"

46: Environment variables: 

46:  LSAN_OPTIONS=suppressions=/Users/travis/build/cmu-db/peloton/test/leak_suppr.txt

46: Test timeout computed to be: 9.99988e+06

46: Running main() from gmock_main.cc

46: [0;32m[==========] [mRunning 6 tests from 1 test case.

46: [0;32m[----------] [mGlobal test environment set-up.

46: [0;32m[----------] [m6 tests from SerializableTransactionTests

46: [0;32m[ RUN      ] [mSerializableTransactionTests.TransactionTest

46: [0;32m[       OK ] [mSerializableTransactionTests.TransactionTest (36 ms)

46: [0;32m[ RUN      ] [mSerializableTransactionTests.ReadOnlyTransactionTest

46: 2018-06-25 20:34:00 [test/concurrency/testing_transaction_util.cpp:192:CreateTable] INFO  - the database_id is 16777216

46: 2018-06-25 20:34:00 [test/concurrency/serializable_transaction_test.cpp:330:ExecuteNext] INFO  - Txn 0 commits: Success

46: /Users/travis/build/cmu-db/peloton/test/concurrency/serializable_transaction_test.cpp:90: Failure

46: Value of: scheduler.schedules[0].results.size()

46:   Actual: 0

46: Expected: 10

46: [0;31m[  FAILED  ] [mSerializableTransactionTests.ReadOnlyTransactionTest (366 ms)

mbutrovich avatar Jun 25 '18 20:06 mbutrovich

Also worth noting, on both occasions observed, this happens on mac builds only

tli2 avatar Jun 26 '18 03:06 tli2

I’ll spend some time on it today.

mbutrovich avatar Jun 26 '18 12:06 mbutrovich

Haven't been able to reproduce it locally all day :(

mbutrovich avatar Jun 26 '18 17:06 mbutrovich

I just tried running it 10-20 times and got it to fail:

[build] ASAN_OPTIONS=detect_container_overflow=0 ./test/serializable_transaction_test
Running main() from gmock_main.cc
[==========] Running 6 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 6 tests from SerializableTransactionTests
[ RUN      ] SerializableTransactionTests.TransactionTest
[       OK ] SerializableTransactionTests.TransactionTest (28 ms)
[ RUN      ] SerializableTransactionTests.ReadOnlyTransactionTest
2018-06-26 13:46:19 [test/concurrency/testing_transaction_util.cpp:192:CreateTable] INFO  - the database_id is 16777216
2018-06-26 13:46:19 [test/concurrency/serializable_transaction_test.cpp:314:ExecuteNext] INFO  - Txn 0 commits: Success
/Users/wanshenl/Documents/pavlo/peloton/test/concurrency/serializable_transaction_test.cpp:90: Failure
Value of: scheduler.schedules[0].results.size()
  Actual: 0
Expected: 10
[  FAILED  ] SerializableTransactionTests.ReadOnlyTransactionTest (217 ms)

Same failure. Didn't have to rebuild between runs.

lmwnshn avatar Jun 26 '18 17:06 lmwnshn

Can you get a TRACE level log of it to get a better idea what's happening? I still can't get it to reproduce.

mbutrovich avatar Jun 26 '18 18:06 mbutrovich

@lmwnshn helpfully ran it 1000 times on this commit just before I did all the TransactionContext refactoring in the last couple of weeks. It still reproduced. My suspicion right now is that in rare cases the bootstrapping process (which seems slower in the last month) takes longer than we wait in the test (1 epoch) and it triggers an abort on the scan. We have 2 experiments to try today:

  1. increase the wait time in the test on master
  2. go back to a commit a month or so ago before we had all the bootstrapping changes. I need to identify the exact commit.

I should add that we've also seen it on the Ubuntu boxes, so this is no longer a Mac-exclusive issue.

mbutrovich avatar Jun 27 '18 14:06 mbutrovich

The test we occasionally fail was modified in #1322. I'll have to spend more time looking at it tomorrow to understand why we seem to have a race.

mbutrovich avatar Jun 27 '18 20:06 mbutrovich