noisepage icon indicating copy to clipboard operation
noisepage copied to clipboard

Query caching is not resilient to DDL changes, need to clear cache on DDL change

Open jkosh44 opened this issue 4 years ago • 5 comments

Bug Report

Summary

In our trace file test harness (https://github.com/cmu-db/noisepage/tree/master/script/testing/junit/traces), reusing the name of a previously dropped table will cause the test to fail in extended more. More specifically, in the test if you drop a table and then create a new table with the same name, then the test will fail when run in extended mode. The specific error is: "An I/O error occurred while sending to the backend."

Environment

OS: Ubuntu (LTS) 20.04

Compiler: GCC 7.0+

CMake Profile: Debug

Jenkins/CI: Discovered here http://jenkins.db.cs.cmu.edu:8080/blue/organizations/jenkins/terrier/detail/PR-1454/34/pipeline

Steps to Reproduce

  1. Create a new .test in script/testing/junit/traces and give it any name.
  2. Copy this into the contents
statement ok
CREATE TABLE foo(a int);

statement ok
INSERT INTO foo VALUES (1), (2), (3);

query I rowsort
SELECT * FROM foo;
----
1
2
3

statement ok
DROP TABLE foo;

statement ok
CREATE TABLE foo(a int);

statement ok
INSERT INTO foo VALUES (6), (5);

query I rowsort
SELECT * FROM foo;
----
5
6

statement ok
DROP TABLE foo;
  1. Compile with the following args: -DCMAKE_BUILD_TYPE=Debug -DNOISEPAGE_USE_ASAN=ON
  2. Run python -m script.testing.junit --build-type=debug --query-mode=extended from the root directory
  3. Observe the failed test

There's no good reason for this test to fail and actually if you run it with --query-mode=simple then it passes.

Expected Behavior

The test passes.

Actual Behavior

The test fails with the following error (ignore my machine specific file paths):

+++ ==================================================================================================== +++
+++ TEST TRACEFILE: /home/joe/CLionProjects/noisepage/script/testing/junit/traces/dup.test               +++

02-22-2021 13:27:57,845 [db_server.py:78] INFO : Running: /home/joe/CLionProjects/noisepage/cmake-build-debug/bin/noisepage -wal_file_path=wal.log
02-22-2021 13:27:57,846 [db_server.py:83] INFO : Ran: /home/joe/CLionProjects/noisepage/cmake-build-debug/bin/noisepage -wal_file_path=wal.log [PID=25257]
02-22-2021 13:27:57,960 [db_server.py:91] INFO : DB process is verified as running in 0.11 sec.
02-22-2021 13:27:57,960 [test_server.py:106] INFO : Logging output (overwrite) to: /tmp/junit_log.txt
02-22-2021 13:27:57,960 [common.py:48] INFO : Running: ant test-trace
02-22-2021 13:28:03,605 [common.py:150] INFO : Buildfile: /home/joe/CLionProjects/noisepage/script/testing/junit/build.xml
02-22-2021 13:28:03,605 [common.py:150] INFO : 
02-22-2021 13:28:03,605 [common.py:150] INFO : compile:
02-22-2021 13:28:03,605 [common.py:150] INFO : [javac] Compiling 1 source file to /home/joe/CLionProjects/noisepage/script/testing/junit/out
02-22-2021 13:28:03,605 [common.py:150] INFO : [javac] Note: /home/joe/CLionProjects/noisepage/script/testing/junit/src/moglib/CommandLine.java uses unchecked or unsafe operations.
02-22-2021 13:28:03,605 [common.py:150] INFO : [javac] Note: Recompile with -Xlint:unchecked for details.
02-22-2021 13:28:03,605 [common.py:150] INFO : 
02-22-2021 13:28:03,605 [common.py:150] INFO : test-trace:
02-22-2021 13:28:03,605 [common.py:150] INFO : [java] Working Directory = /home/joe/CLionProjects/noisepage/script/testing/junit
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] File name: /home/joe/CLionProjects/noisepage/script/testing/junit/traces/dup.test
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] Connection closed (DBMS segfault?), skipping the rest of the tests.
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] ╷
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] ├─ JUnit Jupiter ✔
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] │  └─ TracefileTest ✔
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] │     └─ generateTest() ✔
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] │        ├─ Line:1 | Expected statement ok ✔
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] │        ├─ Line:4 | Expected statement ok ✔
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] │        ├─ Line:7 | Hash:c0710d6b4f15dfa88f600b0e6b624077 ✔
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] │        ├─ Line:14 | Expected statement ok ✔
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] │        ├─ Line:17 | Expected statement ok ✔
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] │        ├─ Line:20 | Expected statement ok ✔
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] │        ├─ Line:23 | Expected statement ok ✔
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] │        └─ Line:26 | Hash:c3fc9da2a6676462da8451e5d08aad9f ✘ Failure at Line 26: An I/O error occurred while sending to the backend.
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] │                 SELECT * FROM foo;
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] └─ JUnit Vintage ✔
02-22-2021 13:28:03,606 [common.py:150] INFO : [java]
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] Failures (1):
02-22-2021 13:28:03,606 [common.py:150] INFO : [java]   JUnit Jupiter:TracefileTest:generateTest():Line:26 | Hash:c3fc9da2a6676462da8451e5d08aad9f
02-22-2021 13:28:03,606 [common.py:150] INFO : [java]     MethodSource [className = 'TracefileTest', methodName = 'generateTest', methodParameterTypes = '']
02-22-2021 13:28:03,606 [common.py:150] INFO : [java]     => java.lang.Exception: Failure at Line 26: An I/O error occurred while sending to the backend.
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] SELECT * FROM foo;
02-22-2021 13:28:03,606 [common.py:150] INFO : [java]        TracefileTest.checkAlwaysFail(TracefileTest.java:288)
02-22-2021 13:28:03,606 [common.py:150] INFO : [java]        TracefileTest.lambda$executeSelectQuery$2(TracefileTest.java:134)
02-22-2021 13:28:03,606 [common.py:150] INFO : [java]        org.junit.jupiter.engine.descriptor.JupiterTestDescriptor.executeAndMaskThrowable(JupiterTestDescriptor.java:141)
02-22-2021 13:28:03,606 [common.py:150] INFO : [java]        org.junit.jupiter.engine.descriptor.DynamicTestTestDescriptor.execute(DynamicTestTestDescriptor.java:41)
02-22-2021 13:28:03,606 [common.py:150] INFO : [java]        org.junit.jupiter.engine.descriptor.DynamicTestTestDescriptor.execute(DynamicTestTestDescriptor.java:24)
02-22-2021 13:28:03,606 [common.py:150] INFO : [java]        org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.lambda$executeRecursively$3(HierarchicalTestExecutor.java:112)
02-22-2021 13:28:03,606 [common.py:150] INFO : [java]        org.junit.platform.engine.support.hierarchical.SingleTestExecutor.executeSafely(SingleTestExecutor.java:66)
02-22-2021 13:28:03,606 [common.py:150] INFO : [java]        org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.executeRecursively(HierarchicalTestExecutor.java:108)
02-22-2021 13:28:03,606 [common.py:150] INFO : [java]        org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.execute(HierarchicalTestExecutor.java:79)
02-22-2021 13:28:03,606 [common.py:150] INFO : [java]        org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.lambda$executeRecursively$0(HierarchicalTestExecutor.java:114)
02-22-2021 13:28:03,606 [common.py:150] INFO : [java]        [...]
02-22-2021 13:28:03,606 [common.py:150] INFO : [java]
02-22-2021 13:28:03,606 [common.py:150] INFO : [java] Test run finished after 604 ms
02-22-2021 13:28:03,607 [common.py:150] INFO : [java] [         4 containers found      ]
02-22-2021 13:28:03,607 [common.py:150] INFO : [java] [         0 containers skipped    ]
02-22-2021 13:28:03,607 [common.py:150] INFO : [java] [         4 containers started    ]
02-22-2021 13:28:03,607 [common.py:150] INFO : [java] [         0 containers aborted    ]
02-22-2021 13:28:03,607 [common.py:150] INFO : [java] [         4 containers successful ]
02-22-2021 13:28:03,607 [common.py:150] INFO : [java] [         0 containers failed     ]
02-22-2021 13:28:03,607 [common.py:150] INFO : [java] [         8 tests found           ]
02-22-2021 13:28:03,607 [common.py:150] INFO : [java] [         0 tests skipped         ]
02-22-2021 13:28:03,607 [common.py:150] INFO : [java] [         8 tests started         ]
02-22-2021 13:28:03,607 [common.py:150] INFO : [java] [         0 tests aborted         ]
02-22-2021 13:28:03,607 [common.py:150] INFO : [java] [         7 tests successful      ]
02-22-2021 13:28:03,607 [common.py:150] INFO : [java] [         1 tests failed          ]
02-22-2021 13:28:03,607 [common.py:150] INFO : [java]
02-22-2021 13:28:03,607 [common.py:150] INFO : 
02-22-2021 13:28:03,607 [common.py:150] INFO : BUILD FAILED
02-22-2021 13:28:03,607 [common.py:150] INFO : /home/joe/CLionProjects/noisepage/script/testing/junit/build.xml:63: Java returned: 1
02-22-2021 13:28:03,607 [common.py:150] INFO : 
02-22-2021 13:28:03,607 [common.py:150] INFO : Total time: 5 seconds
02-22-2021 13:28:03,607 [db_server.py:145] INFO : DBMS already terminated, code: -6
02-22-2021 13:28:03,607 [db_server.py:164] INFO : ************ DB Logs Start ************
02-22-2021 13:28:03,607 [common.py:177] ERROR: 
02-22-2021 13:28:03,607 [db_server.py:166] INFO : ************* DB Logs End *************
Traceback (most recent call last):
  File "/home/joe/CLionProjects/noisepage/script/testing/util/test_server.py", line 137, in run
    exit_codes = self.run_test_suite(test_suite)
  File "/home/joe/CLionProjects/noisepage/script/testing/util/test_server.py", line 203, in run_test_suite
    self.db_instance.stop_db(self.is_dry_run)
  File "/home/joe/CLionProjects/noisepage/script/testing/util/db_server.py", line 147, in stop_db
    raise RuntimeError(msg)
RuntimeError: DBMS already terminated, code: -6
02-22-2021 13:28:03,608 [test_server.py:149] ERROR: The test suite failed
02-22-2021 13:28:03,608 [__main__.py:108] INFO : Final Status => FAIL

jkosh44 avatar Feb 22 '21 18:02 jkosh44

This is actually due to query caching, and only manifests because we only enabled it with extended query protocol. This is documented in the setting, and we've fixed bugs related to trace files for this in the past, but we really need flashing red lights and all caps documentation that basically says to never reuse relation or index names in trace file tests if you're not gonna disconnect the client between test cases.

https://github.com/cmu-db/noisepage/blob/a6ea18821c3f53bd0074f25f2c75bb67b5b7e8c9/src/include/settings/settings_defs.h#L316

mbutrovich avatar Feb 22 '21 18:02 mbutrovich

I should probably add that to confirm my theory of the duplicate table name being the cause, if you change the test to the following

statement ok
CREATE TABLE foo(a int);

statement ok
INSERT INTO foo VALUES (1), (2), (3);

query I rowsort
SELECT * FROM foo;
----
1
2
3

statement ok
DROP TABLE foo;

statement ok
CREATE TABLE bar(a int);

statement ok
INSERT INTO bar VALUES (6), (5);

query I rowsort
SELECT * FROM bar;
----
5
6

statement ok
DROP TABLE bar;

Then it will start passing. Notice that the only difference is the name of the second table changed.

jkosh44 avatar Feb 22 '21 18:02 jkosh44

@mbutrovich Oh, that makes sense. Should I just close this then?

jkosh44 avatar Feb 22 '21 18:02 jkosh44

I guess it's fine to leave open since it's still a real issue and we can document this conversation. I changed the title.

mbutrovich avatar Feb 22 '21 18:02 mbutrovich

I'm surprised we don't blow up elsewhere, I think some trace files violate this (select.test for example, search for create table t).

lmwnshn avatar Feb 22 '21 19:02 lmwnshn