cockroach
cockroach copied to clipboard
logictest: select_for_update is flaky
pkg/sql/logictest/tests/local/local_test.TestLogic_select_for_update failed with artifacts on master @ 6300c3c3367ad46ac48bf24915cf0d73cae446a0:
=== RUN TestLogic_select_for_update
test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/e30685d5351bb37f7931bdadd013af1b/logTestLogic_select_for_update2572762351
test_log_scope.go:81: use -show-logs to present logs inline
logic.go:2968:
/home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/11928/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:575: SELECT k, u FROM t3 WHERE u = 2 FOR UPDATE SKIP LOCKED
expected:
2 2
4 2
but found (query options: "rowsort" -> ignore the following ordering of rows) :
[04:00:43] --- done: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/11928/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update with config local: 141 tests, 1 failures
logic.go:4135:
/home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/11928/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:581: error while processing
logic.go:4135: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/11928/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:581: too many errors encountered, skipping the rest of the input
panic.go:626: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/e30685d5351bb37f7931bdadd013af1b/logTestLogic_select_for_update2572762351
--- FAIL: TestLogic_select_for_update (1.27s)
This test on roachdash | Improve this report!
Jira issue: CRDB-38757
This reproduces quickly with ./dev test pkg/sql/logictest/tests/local -f TestLogic_select_for_update --stress
. It looks like something in the last ~50 commits on master is causing this. I'm bisecting now.
I'm having trouble bisecting because I think I was overly optimistic on how quickly the failure occurs under stress. Sometimes it will fail in the first few seconds, but other times I'll make it through all 1000 iterations without a failure, on a commit that I believe contains the change.
pkg/sql/logictest/tests/local/local_test.TestLogic_select_for_update failed on master @ 992d4573fb8b71e28717b4d14c4c7ebccb38c412:
=== RUN TestLogic_select_for_update
test_log_scope.go:170: test logs captured to: outputs.zip/logTestLogic_select_for_update2654335675
test_log_scope.go:81: use -show-logs to present logs inline
[07:36:53] setting distsql_workmem='38107B';
logic.go:2968:
/var/lib/engflow/worker/work/2/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:575: SELECT k, u FROM t3 WHERE u = 2 FOR UPDATE SKIP LOCKED
expected:
2 2
4 2
but found (query options: "rowsort" -> ignore the following ordering of rows) :
[07:36:53] --- done: /var/lib/engflow/worker/work/2/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update with config local: 141 tests, 1 failures
logic.go:4135:
/var/lib/engflow/worker/work/2/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:581: error while processing
logic.go:4135: /var/lib/engflow/worker/work/2/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:581: too many errors encountered, skipping the rest of the input
panic.go:626: -- test log scope end --
test logs left over in: outputs.zip/logTestLogic_select_for_update2654335675
--- FAIL: TestLogic_select_for_update (0.91s)
Parameters:
-
attempt=1
-
run=23
-
shard=15
pkg/sql/logictest/tests/local/local_test.TestLogic_select_for_update failed on master @ e6b993509e6ecdc0a0a88a8363cdd8a206ecf7f3:
=== RUN TestLogic_select_for_update
test_log_scope.go:170: test logs captured to: outputs.zip/logTestLogic_select_for_update3919272814
test_log_scope.go:81: use -show-logs to present logs inline
[19:45:39] setting distsql_workmem='36530B';
logic.go:2968:
/var/lib/engflow/worker/work/1/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:575: SELECT k, u FROM t3 WHERE u = 2 FOR UPDATE SKIP LOCKED
expected:
2 2
4 2
but found (query options: "rowsort" -> ignore the following ordering of rows) :
[19:45:39] --- done: /var/lib/engflow/worker/work/1/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update with config local: 141 tests, 1 failures
logic.go:4135:
/var/lib/engflow/worker/work/1/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:581: error while processing
logic.go:4135: /var/lib/engflow/worker/work/1/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:581: too many errors encountered, skipping the rest of the input
panic.go:626: -- test log scope end --
test logs left over in: outputs.zip/logTestLogic_select_for_update3919272814
--- FAIL: TestLogic_select_for_update (0.77s)
Parameters:
-
attempt=1
-
run=1
-
shard=15
pkg/sql/logictest/tests/local/local_test.TestLogic_select_for_update failed with artifacts on master @ c511a3284cd821c40145f151537dd6b971dce1eb:
=== RUN TestLogic_select_for_update
test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/e30685d5351bb37f7931bdadd013af1b/logTestLogic_select_for_update1937186316
test_log_scope.go:81: use -show-logs to present logs inline
[10:25:50] setting distsql_workmem='75581B';
logic.go:2968:
/home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/15423/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:575: SELECT k, u FROM t3 WHERE u = 2 FOR UPDATE SKIP LOCKED
expected:
2 2
4 2
but found (query options: "rowsort" -> ignore the following ordering of rows) :
[10:25:51] --- done: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/15423/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update with config local: 141 tests, 1 failures
logic.go:4135:
/home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/15423/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:581: error while processing
logic.go:4135: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/15423/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:581: too many errors encountered, skipping the rest of the input
panic.go:626: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/e30685d5351bb37f7931bdadd013af1b/logTestLogic_select_for_update1937186316
--- FAIL: TestLogic_select_for_update (1.51s)
Can we skip the test for now?
I wonder if this is related to https://github.com/cockroachdb/cockroach/issues/121917?
I bisected this to 25edb96894f9c4654c4bb52477b9887cf8a13150, cc @Dedej-Bergin @rafiss @fqazi thoughts on how that change can affect this test? (I used ./dev test ./pkg/sql/logictest/tests/local -f=TestLogic_select_for_update --stress --cpus=48 --count=1000
on the gceworker for the bisect.)
I was able to repro with this smaller testdata file:
statement ok
CREATE TABLE t (k INT PRIMARY KEY, v int, FAMILY (k, v))
statement ok
GRANT admin TO testuser
# The SKIP LOCKED wait policy skip rows when a conflicting lock is encountered.
statement ok
INSERT INTO t VALUES (1,1), (2, 2), (3, 3), (4, 4)
statement ok
CREATE TABLE t3 (
k INT PRIMARY KEY,
v INT,
u INT,
INDEX (u),
FAMILY (k, v, u)
);
INSERT INTO t3 VALUES (1, 1, 1), (2, 2, 2), (3, 3, 3), (4, 4, 2);
statement ok
BEGIN; UPDATE t SET v = 3 WHERE k = 2; UPDATE t3 SET v = 3 WHERE k = 2
user testuser
statement ok
BEGIN
query II rowsort
SELECT * FROM t FOR UPDATE SKIP LOCKED
----
1 1
3 3
4 4
statement ok
UPDATE t SET v = 4 WHERE k = 3
query II rowsort
SELECT * FROM t FOR UPDATE SKIP LOCKED
----
1 1
3 4
4 4
# All columns are available from the secondary index on u, so no index join is
# needed. The secondary index can produce the row where k=2 since the row is
# only locked in the primary index.
query II rowsort
SELECT k, u FROM t3 WHERE u = 2 FOR UPDATE SKIP LOCKED
----
2 2
4 2
That's pretty surprising to me -- this example doesn't use any GRANT <privilege> ...
statements, which were the ones that 25edb96 modified.
I was able to reproduce this on 9203985b8ee0092cc9ba1f517cb5f1306f07e2ff as well, which is the parent commit of 25edb96894f9c4654c4bb52477b9887cf8a13150. I'm starting another bisect to find the first failing commit.
@rafiss is this entirely serializable? If it's read committed, it might be worth waiting until https://github.com/cockroachdb/cockroach/issues/121917 is fixed.
This fails all the way back to d93923b15ecccbbf6975a9b5cc4c2ed050e324a6 from over a year ago (I still haven't found the first failing commit), so it does seem likely to be https://github.com/cockroachdb/cockroach/issues/121917, and unrelated to the change we made in https://github.com/cockroachdb/cockroach/commit/25edb96894f9c4654c4bb52477b9887cf8a13150. I will add this back to the SQL Queries board.
Ack, thanks Rafi. I'm guessing that 25edb96894f9c4654c4bb52477b9887cf8a13150 somehow made the flake more likely.
My repro with the example above in pkg/sql/logictest/testdata/logic_test/_tmp
running with ./dev test ./pkg/sql/logictest/tests/local -f=TestLogic_tmp --stress --count=1000
fails just as quickly before or after 25edb96.
Here's an even smaller pkg/sql/logictest/testdata/logic_test/_tmp
testdata file that fails fairly quickly with ./dev testlogic base --config=local --files=tmp --stress
:
statement ok
GRANT admin TO testuser
statement ok
CREATE TABLE t3 (
k INT PRIMARY KEY,
v INT,
u INT,
INDEX (u),
FAMILY (k, v, u)
);
INSERT INTO t3 VALUES (1, 1, 1), (2, 2, 2), (3, 3, 3), (4, 4, 2);
user testuser
query III rowsort
SELECT * FROM t3 FOR UPDATE SKIP LOCKED
----
1 1 1
2 2 2
3 3 3
4 4 2
So I think the SKIP LOCKED is sometimes skipping over the inserted rows. @nvanbenschoten is it possible the intents from the INSERT look like locks that should be skipped?
Interestingly, one small change seems to fix this: moving the INSERT statement into its own command.
So, if the INSERT is in the same command as CREATE TABLE, do they run in the same transaction? Why would that cause SKIP LOCKED to skip over the inserted rows? Maybe it takes longer to resolve the transaction record when it includes DDL, so the intents look more like locks?
Based on the specified backports for linked PR #128762, I applied the following new label(s) to this issue: branch-release-24.2. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.