cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

logictest: select_for_update is flaky

Open cockroach-teamcity opened this issue 9 months ago • 14 comments

pkg/sql/logictest/tests/local/local_test.TestLogic_select_for_update failed with artifacts on master @ 6300c3c3367ad46ac48bf24915cf0d73cae446a0:

=== RUN   TestLogic_select_for_update
    test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/e30685d5351bb37f7931bdadd013af1b/logTestLogic_select_for_update2572762351
    test_log_scope.go:81: use -show-logs to present logs inline
    logic.go:2968: 
         
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/11928/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:575: SELECT k, u FROM t3 WHERE u = 2 FOR UPDATE SKIP LOCKED
        expected:
            2  2
            4  2
            
        but found (query options: "rowsort" -> ignore the following ordering of rows) :
            
[04:00:43] --- done: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/11928/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update with config local: 141 tests, 1 failures
    logic.go:4135: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/11928/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:581: error while processing
    logic.go:4135: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/11928/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:581: too many errors encountered, skipping the rest of the input
    panic.go:626: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/e30685d5351bb37f7931bdadd013af1b/logTestLogic_select_for_update2572762351
--- FAIL: TestLogic_select_for_update (1.27s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

Jira issue: CRDB-38757

cockroach-teamcity avatar May 15 '24 04:05 cockroach-teamcity

This reproduces quickly with ./dev test pkg/sql/logictest/tests/local -f TestLogic_select_for_update --stress. It looks like something in the last ~50 commits on master is causing this. I'm bisecting now.

mgartner avatar May 15 '24 14:05 mgartner

I'm having trouble bisecting because I think I was overly optimistic on how quickly the failure occurs under stress. Sometimes it will fail in the first few seconds, but other times I'll make it through all 1000 iterations without a failure, on a commit that I believe contains the change.

mgartner avatar May 15 '24 15:05 mgartner

pkg/sql/logictest/tests/local/local_test.TestLogic_select_for_update failed on master @ 992d4573fb8b71e28717b4d14c4c7ebccb38c412:

=== RUN   TestLogic_select_for_update
    test_log_scope.go:170: test logs captured to: outputs.zip/logTestLogic_select_for_update2654335675
    test_log_scope.go:81: use -show-logs to present logs inline
[07:36:53] setting distsql_workmem='38107B';
    logic.go:2968: 
         
        /var/lib/engflow/worker/work/2/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:575: SELECT k, u FROM t3 WHERE u = 2 FOR UPDATE SKIP LOCKED
        expected:
            2  2
            4  2
            
        but found (query options: "rowsort" -> ignore the following ordering of rows) :
            
[07:36:53] --- done: /var/lib/engflow/worker/work/2/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update with config local: 141 tests, 1 failures
    logic.go:4135: 
        /var/lib/engflow/worker/work/2/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:581: error while processing
    logic.go:4135: /var/lib/engflow/worker/work/2/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:581: too many errors encountered, skipping the rest of the input
    panic.go:626: -- test log scope end --
test logs left over in: outputs.zip/logTestLogic_select_for_update2654335675
--- FAIL: TestLogic_select_for_update (0.91s)

Parameters:

  • attempt=1
  • run=23
  • shard=15
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

cockroach-teamcity avatar May 16 '24 14:05 cockroach-teamcity

pkg/sql/logictest/tests/local/local_test.TestLogic_select_for_update failed on master @ e6b993509e6ecdc0a0a88a8363cdd8a206ecf7f3:

=== RUN   TestLogic_select_for_update
    test_log_scope.go:170: test logs captured to: outputs.zip/logTestLogic_select_for_update3919272814
    test_log_scope.go:81: use -show-logs to present logs inline
[19:45:39] setting distsql_workmem='36530B';
    logic.go:2968: 
         
        /var/lib/engflow/worker/work/1/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:575: SELECT k, u FROM t3 WHERE u = 2 FOR UPDATE SKIP LOCKED
        expected:
            2  2
            4  2
            
        but found (query options: "rowsort" -> ignore the following ordering of rows) :
            
[19:45:39] --- done: /var/lib/engflow/worker/work/1/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update with config local: 141 tests, 1 failures
    logic.go:4135: 
        /var/lib/engflow/worker/work/1/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:581: error while processing
    logic.go:4135: /var/lib/engflow/worker/work/1/exec/bazel-out/k8-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:581: too many errors encountered, skipping the rest of the input
    panic.go:626: -- test log scope end --
test logs left over in: outputs.zip/logTestLogic_select_for_update3919272814
--- FAIL: TestLogic_select_for_update (0.77s)

Parameters:

  • attempt=1
  • run=1
  • shard=15
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

github-actions[bot] avatar May 16 '24 19:05 github-actions[bot]

pkg/sql/logictest/tests/local/local_test.TestLogic_select_for_update failed with artifacts on master @ c511a3284cd821c40145f151537dd6b971dce1eb:

=== RUN   TestLogic_select_for_update
    test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/e30685d5351bb37f7931bdadd013af1b/logTestLogic_select_for_update1937186316
    test_log_scope.go:81: use -show-logs to present logs inline
[10:25:50] setting distsql_workmem='75581B';
    logic.go:2968: 
         
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/15423/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:575: SELECT k, u FROM t3 WHERE u = 2 FOR UPDATE SKIP LOCKED
        expected:
            2  2
            4  2
            
        but found (query options: "rowsort" -> ignore the following ordering of rows) :
            
[10:25:51] --- done: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/15423/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update with config local: 141 tests, 1 failures
    logic.go:4135: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/15423/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:581: error while processing
    logic.go:4135: /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/15423/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/logictest/tests/local/local_test_/local_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/select_for_update:581: too many errors encountered, skipping the rest of the input
    panic.go:626: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/e30685d5351bb37f7931bdadd013af1b/logTestLogic_select_for_update1937186316
--- FAIL: TestLogic_select_for_update (1.51s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

cockroach-teamcity avatar May 17 '24 10:05 cockroach-teamcity

Can we skip the test for now?

RaduBerinde avatar May 17 '24 13:05 RaduBerinde

I wonder if this is related to https://github.com/cockroachdb/cockroach/issues/121917?

michae2 avatar May 17 '24 17:05 michae2

I bisected this to 25edb96894f9c4654c4bb52477b9887cf8a13150, cc @Dedej-Bergin @rafiss @fqazi thoughts on how that change can affect this test? (I used ./dev test ./pkg/sql/logictest/tests/local -f=TestLogic_select_for_update --stress --cpus=48 --count=1000 on the gceworker for the bisect.)

yuzefovich avatar May 18 '24 21:05 yuzefovich

I was able to repro with this smaller testdata file:

statement ok
CREATE TABLE t (k INT PRIMARY KEY, v int, FAMILY (k, v))

statement ok
GRANT admin TO testuser

# The SKIP LOCKED wait policy skip rows when a conflicting lock is encountered.

statement ok
INSERT INTO t VALUES (1,1), (2, 2), (3, 3), (4, 4)

statement ok
CREATE TABLE t3 (
  k INT PRIMARY KEY,
  v INT,
  u INT,
  INDEX (u),
  FAMILY (k, v, u)
);
INSERT INTO t3 VALUES (1, 1, 1), (2, 2, 2), (3, 3, 3), (4, 4, 2);

statement ok
BEGIN; UPDATE t SET v = 3 WHERE k = 2; UPDATE t3 SET v = 3 WHERE k = 2

user testuser

statement ok
BEGIN

query II rowsort
SELECT * FROM t FOR UPDATE SKIP LOCKED
----
1  1
3  3
4  4

statement ok
UPDATE t SET v = 4 WHERE k = 3

query II rowsort
SELECT * FROM t FOR UPDATE SKIP LOCKED
----
1  1
3  4
4  4

# All columns are available from the secondary index on u, so no index join is
# needed. The secondary index can produce the row where k=2 since the row is
# only locked in the primary index.
query II rowsort
SELECT k, u FROM t3 WHERE u = 2 FOR UPDATE SKIP LOCKED
----
2  2
4  2

That's pretty surprising to me -- this example doesn't use any GRANT <privilege> ... statements, which were the ones that 25edb96 modified.

rafiss avatar May 23 '24 04:05 rafiss

I was able to reproduce this on 9203985b8ee0092cc9ba1f517cb5f1306f07e2ff as well, which is the parent commit of 25edb96894f9c4654c4bb52477b9887cf8a13150. I'm starting another bisect to find the first failing commit.

rafiss avatar May 24 '24 15:05 rafiss

@rafiss is this entirely serializable? If it's read committed, it might be worth waiting until https://github.com/cockroachdb/cockroach/issues/121917 is fixed.

michae2 avatar May 24 '24 17:05 michae2

This fails all the way back to d93923b15ecccbbf6975a9b5cc4c2ed050e324a6 from over a year ago (I still haven't found the first failing commit), so it does seem likely to be https://github.com/cockroachdb/cockroach/issues/121917, and unrelated to the change we made in https://github.com/cockroachdb/cockroach/commit/25edb96894f9c4654c4bb52477b9887cf8a13150. I will add this back to the SQL Queries board.

rafiss avatar May 24 '24 18:05 rafiss

Ack, thanks Rafi. I'm guessing that 25edb96894f9c4654c4bb52477b9887cf8a13150 somehow made the flake more likely.

yuzefovich avatar May 24 '24 18:05 yuzefovich

My repro with the example above in pkg/sql/logictest/testdata/logic_test/_tmp running with ./dev test ./pkg/sql/logictest/tests/local -f=TestLogic_tmp --stress --count=1000 fails just as quickly before or after 25edb96.

rafiss avatar May 24 '24 18:05 rafiss

Here's an even smaller pkg/sql/logictest/testdata/logic_test/_tmp testdata file that fails fairly quickly with ./dev testlogic base --config=local --files=tmp --stress:

statement ok
GRANT admin TO testuser

statement ok
CREATE TABLE t3 (
  k INT PRIMARY KEY,
  v INT,
  u INT,
  INDEX (u),
  FAMILY (k, v, u)
);
INSERT INTO t3 VALUES (1, 1, 1), (2, 2, 2), (3, 3, 3), (4, 4, 2);

user testuser

query III rowsort
SELECT * FROM t3 FOR UPDATE SKIP LOCKED
----
1  1  1
2  2  2
3  3  3
4  4  2

So I think the SKIP LOCKED is sometimes skipping over the inserted rows. @nvanbenschoten is it possible the intents from the INSERT look like locks that should be skipped?

michae2 avatar Aug 10 '24 00:08 michae2

Interestingly, one small change seems to fix this: moving the INSERT statement into its own command.

So, if the INSERT is in the same command as CREATE TABLE, do they run in the same transaction? Why would that cause SKIP LOCKED to skip over the inserted rows? Maybe it takes longer to resolve the transaction record when it includes DDL, so the intents look more like locks?

michae2 avatar Aug 10 '24 00:08 michae2

Based on the specified backports for linked PR #128762, I applied the following new label(s) to this issue: branch-release-24.2. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

blathers-crl[bot] avatar Aug 12 '24 16:08 blathers-crl[bot]