cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

kv/kvserver/batcheval: TestLeaseTransferWithPipelinedWrite failed

Open cockroach-teamcity opened this issue 3 years ago • 4 comments

kv/kvserver/batcheval.TestLeaseTransferWithPipelinedWrite failed with artifacts on master @ 5f854ce368440e2289af5db321f7330dbef87639:

=== RUN   TestLeaseTransferWithPipelinedWrite
    test_log_scope.go:162: test logs captured to: /artifacts/tmp/_tmp/8b7adba2afa4d2ee9e34e55bcb54994d/logTestLeaseTransferWithPipelinedWrite3490123133
    test_log_scope.go:80: use -show-logs to present logs inline
    cmd_lease_test.go:112: worker failed: pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN): "sql txn" meta={id=e6c9bcc5 key=/Table/132/1/0/1/0 pri=0.06112298 epo=0 ts=1659598480.500067056,1 min=1659598480.416811047,0 seq=3} lock=true stat=PENDING rts=1659598480.500067056,1 wto=false gul=1659598480.916811047,0
    panic.go:500: -- test log scope end --
--- FAIL: TestLeaseTransferWithPipelinedWrite (56.76s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-18331

cockroach-teamcity avatar Aug 04 '22 07:08 cockroach-teamcity

kv/kvserver/batcheval.TestLeaseTransferWithPipelinedWrite failed with artifacts on master @ d687b14e513a49f94e1862c78a7685b78f4fbad9:

=== RUN   TestLeaseTransferWithPipelinedWrite
    test_log_scope.go:162: test logs captured to: /artifacts/tmp/_tmp/8b7adba2afa4d2ee9e34e55bcb54994d/logTestLeaseTransferWithPipelinedWrite2936323110
    test_log_scope.go:80: use -show-logs to present logs inline
    cmd_lease_test.go:112: worker failed: pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN): "sql txn" meta={id=1b827ad8 key=/Table/115/1/0/1/0 pri=0.02346137 epo=0 ts=1659683477.271226262,1 min=1659683477.125156301,0 seq=3} lock=true stat=PENDING rts=1659683477.271226262,1 wto=false gul=1659683477.625156301,0
    panic.go:500: -- test log scope end --
--- FAIL: TestLeaseTransferWithPipelinedWrite (39.44s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

cockroach-teamcity avatar Aug 05 '22 07:08 cockroach-teamcity

kv/kvserver/batcheval.TestLeaseTransferWithPipelinedWrite failed with artifacts on master @ db95c2484a9f15c2de01466af0aae19c8ab57618:

=== RUN   TestLeaseTransferWithPipelinedWrite
    test_log_scope.go:162: test logs captured to: /artifacts/tmp/_tmp/8b7adba2afa4d2ee9e34e55bcb54994d/logTestLeaseTransferWithPipelinedWrite2679577506
    test_log_scope.go:80: use -show-logs to present logs inline
    cmd_lease_test.go:112: worker failed: pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN): "sql txn" meta={id=636d0007 key=/Table/130/1/0/1/0 pri=0.00852398 epo=0 ts=1659771960.580418844,0 min=1659771960.580418844,0 seq=3} lock=true stat=PENDING rts=1659771960.580418844,0 wto=false gul=1659771961.080418844,0
    panic.go:500: -- test log scope end --
--- FAIL: TestLeaseTransferWithPipelinedWrite (55.10s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

cockroach-teamcity avatar Aug 06 '22 07:08 cockroach-teamcity

kv/kvserver/batcheval.TestLeaseTransferWithPipelinedWrite failed with artifacts on master @ 6c24f3564e213371bbe800e9646386cdefeb1546:

=== RUN   TestLeaseTransferWithPipelinedWrite
    test_log_scope.go:162: test logs captured to: /artifacts/tmp/_tmp/8b7adba2afa4d2ee9e34e55bcb54994d/logTestLeaseTransferWithPipelinedWrite3061496584
    test_log_scope.go:80: use -show-logs to present logs inline
    cmd_lease_test.go:112: worker failed: pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN): "sql txn" meta={id=6c1c5ff4 key=/Table/124/1/0/1/0 pri=0.05493523 epo=0 ts=1660118565.842213510,1 min=1660118565.629899239,0 seq=3} lock=true stat=PENDING rts=1660118565.842213510,1 wto=false gul=1660118566.129899239,0
    panic.go:500: -- test log scope end --
--- FAIL: TestLeaseTransferWithPipelinedWrite (45.65s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

cockroach-teamcity avatar Aug 10 '22 08:08 cockroach-teamcity

kv/kvserver/batcheval.TestLeaseTransferWithPipelinedWrite failed with artifacts on master @ 9bd865dadb0601059ee0f8419d1ad47fb74db80a:

=== RUN   TestLeaseTransferWithPipelinedWrite
    test_log_scope.go:162: test logs captured to: /artifacts/tmp/_tmp/8b7adba2afa4d2ee9e34e55bcb54994d/logTestLeaseTransferWithPipelinedWrite1002527624
    test_log_scope.go:80: use -show-logs to present logs inline
    cmd_lease_test.go:112: worker failed: pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN): "sql txn" meta={id=71e65089 key=/Table/131/1/0/1/0 pri=0.01318320 epo=0 ts=1660203131.418290275,1 min=1660203131.251080030,0 seq=3} lock=true stat=PENDING rts=1660203131.418290275,1 wto=false gul=1660203131.751080030,0
    panic.go:500: -- test log scope end --
--- FAIL: TestLeaseTransferWithPipelinedWrite (56.69s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

cockroach-teamcity avatar Aug 11 '22 07:08 cockroach-teamcity

kv/kvserver/batcheval.TestLeaseTransferWithPipelinedWrite failed with artifacts on master @ 6b7f0058325de8b8b321a08137d462c418c3047f:

=== RUN   TestLeaseTransferWithPipelinedWrite
    test_log_scope.go:162: test logs captured to: /artifacts/tmp/_tmp/8b7adba2afa4d2ee9e34e55bcb54994d/logTestLeaseTransferWithPipelinedWrite2716334196
    test_log_scope.go:80: use -show-logs to present logs inline
    cmd_lease_test.go:112: worker failed: pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN): "sql txn" meta={id=30cd482c key=/Table/133/1/0/1/0 pri=0.01267302 epo=0 ts=1660548155.641867712,1 min=1660548155.504547373,0 seq=3} lock=true stat=PENDING rts=1660548155.641867712,1 wto=false gul=1660548156.004547373,0
    panic.go:500: -- test log scope end --
--- FAIL: TestLeaseTransferWithPipelinedWrite (67.94s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

cockroach-teamcity avatar Aug 15 '22 07:08 cockroach-teamcity

This error looks benign enough but this test isn't equipped to handle it. I'm not sure why this started firing recently however.

https://github.com/cockroachdb/cockroach/blob/06cc644939f84fcd42914cf7e0f16f7e7076fcee/pkg/kv/kvserver/replica_tscache.go#L561-L565

Quick stress runs locally, running latest master, for a few minutes didn't trip things up. Perhaps worth trying more seriously on gceworker/etc, but only to figure out what's changed recently. This error seems fine, and perhaps the test should be adjusted to handle it. @nvanbenschoten, leaving it on your plate in case if you have any ideas. Feel free to re-assign etc.

irfansharif avatar Aug 18 '22 19:08 irfansharif

I've also stressed this for about an hour on a 20-node roachprod cluster (make roachprod-stress) and locally with bazel (dev test --stress) without a failure. The failure certainly seems new, but I'm not aware of any change that would have caused it.

However, it is strange to me that this transaction would fail to refresh. It hasn't performed any reads, so why is the retry error getting all the way back to the client?

nvb avatar Aug 22 '22 13:08 nvb

However, it is strange to me that this transaction would fail to refresh. It hasn't performed any reads, so why is the retry error getting all the way back to the client?

This isn't the right question. The ABORT_REASON_NEW_LEASE_PREVENTS_TXN error path will result in a TransactionAbortedError, which can not be refreshed and must result in a transaction retry.

This test is not using a txn retry loop, so the error is expected.

The question then again becomes: why are we running into a new lease? This is a 3 node cluster, so perhaps there was a lease transfer? Did something recently change in this area?

nvb avatar Aug 22 '22 14:08 nvb

I can reproduce this if I bump up the internal iteration count to 300: https://github.com/cockroachdb/cockroach/blob/9c7db33c1ee21145fe372d61ad26b45a9f151ee3/pkg/kv/kvserver/batcheval/cmd_lease_test.go#L52-L53

Reading through the test's history and comments, it's all about lease transfers, so I guess it's not surprising that we're seeing this.

nvb avatar Aug 22 '22 14:08 nvb

kv/kvserver/batcheval.TestLeaseTransferWithPipelinedWrite failed with artifacts on master @ 003c0360de8b64319b5f0f127b99be91dbdca8a3:

=== RUN   TestLeaseTransferWithPipelinedWrite
    test_log_scope.go:162: test logs captured to: /artifacts/tmp/_tmp/8b7adba2afa4d2ee9e34e55bcb54994d/logTestLeaseTransferWithPipelinedWrite555953228
    test_log_scope.go:80: use -show-logs to present logs inline
    cmd_lease_test.go:112: worker failed: pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN): "sql txn" meta={id=66015018 key=/Table/111/1/0/1/0 pri=0.06724410 epo=0 ts=1661324742.116263602,0 min=1661324742.116263602,0 seq=3} lock=true stat=PENDING rts=1661324742.116263602,0 wto=false gul=1661324742.616263602,0
    panic.go:500: -- test log scope end --
--- FAIL: TestLeaseTransferWithPipelinedWrite (37.24s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

cockroach-teamcity avatar Aug 24 '22 07:08 cockroach-teamcity