cockroach
cockroach copied to clipboard
kv/kvserver/batcheval: TestLeaseTransferWithPipelinedWrite failed
kv/kvserver/batcheval.TestLeaseTransferWithPipelinedWrite failed with artifacts on master @ 5f854ce368440e2289af5db321f7330dbef87639:
=== RUN TestLeaseTransferWithPipelinedWrite
test_log_scope.go:162: test logs captured to: /artifacts/tmp/_tmp/8b7adba2afa4d2ee9e34e55bcb54994d/logTestLeaseTransferWithPipelinedWrite3490123133
test_log_scope.go:80: use -show-logs to present logs inline
cmd_lease_test.go:112: worker failed: pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN): "sql txn" meta={id=e6c9bcc5 key=/Table/132/1/0/1/0 pri=0.06112298 epo=0 ts=1659598480.500067056,1 min=1659598480.416811047,0 seq=3} lock=true stat=PENDING rts=1659598480.500067056,1 wto=false gul=1659598480.916811047,0
panic.go:500: -- test log scope end --
--- FAIL: TestLeaseTransferWithPipelinedWrite (56.76s)
Parameters: TAGS=bazel,gss
This test on roachdash | Improve this report!
Jira issue: CRDB-18331
kv/kvserver/batcheval.TestLeaseTransferWithPipelinedWrite failed with artifacts on master @ d687b14e513a49f94e1862c78a7685b78f4fbad9:
=== RUN TestLeaseTransferWithPipelinedWrite
test_log_scope.go:162: test logs captured to: /artifacts/tmp/_tmp/8b7adba2afa4d2ee9e34e55bcb54994d/logTestLeaseTransferWithPipelinedWrite2936323110
test_log_scope.go:80: use -show-logs to present logs inline
cmd_lease_test.go:112: worker failed: pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN): "sql txn" meta={id=1b827ad8 key=/Table/115/1/0/1/0 pri=0.02346137 epo=0 ts=1659683477.271226262,1 min=1659683477.125156301,0 seq=3} lock=true stat=PENDING rts=1659683477.271226262,1 wto=false gul=1659683477.625156301,0
panic.go:500: -- test log scope end --
--- FAIL: TestLeaseTransferWithPipelinedWrite (39.44s)
Parameters: TAGS=bazel,gss
kv/kvserver/batcheval.TestLeaseTransferWithPipelinedWrite failed with artifacts on master @ db95c2484a9f15c2de01466af0aae19c8ab57618:
=== RUN TestLeaseTransferWithPipelinedWrite
test_log_scope.go:162: test logs captured to: /artifacts/tmp/_tmp/8b7adba2afa4d2ee9e34e55bcb54994d/logTestLeaseTransferWithPipelinedWrite2679577506
test_log_scope.go:80: use -show-logs to present logs inline
cmd_lease_test.go:112: worker failed: pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN): "sql txn" meta={id=636d0007 key=/Table/130/1/0/1/0 pri=0.00852398 epo=0 ts=1659771960.580418844,0 min=1659771960.580418844,0 seq=3} lock=true stat=PENDING rts=1659771960.580418844,0 wto=false gul=1659771961.080418844,0
panic.go:500: -- test log scope end --
--- FAIL: TestLeaseTransferWithPipelinedWrite (55.10s)
Parameters: TAGS=bazel,gss
kv/kvserver/batcheval.TestLeaseTransferWithPipelinedWrite failed with artifacts on master @ 6c24f3564e213371bbe800e9646386cdefeb1546:
=== RUN TestLeaseTransferWithPipelinedWrite
test_log_scope.go:162: test logs captured to: /artifacts/tmp/_tmp/8b7adba2afa4d2ee9e34e55bcb54994d/logTestLeaseTransferWithPipelinedWrite3061496584
test_log_scope.go:80: use -show-logs to present logs inline
cmd_lease_test.go:112: worker failed: pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN): "sql txn" meta={id=6c1c5ff4 key=/Table/124/1/0/1/0 pri=0.05493523 epo=0 ts=1660118565.842213510,1 min=1660118565.629899239,0 seq=3} lock=true stat=PENDING rts=1660118565.842213510,1 wto=false gul=1660118566.129899239,0
panic.go:500: -- test log scope end --
--- FAIL: TestLeaseTransferWithPipelinedWrite (45.65s)
Parameters: TAGS=bazel,gss
kv/kvserver/batcheval.TestLeaseTransferWithPipelinedWrite failed with artifacts on master @ 9bd865dadb0601059ee0f8419d1ad47fb74db80a:
=== RUN TestLeaseTransferWithPipelinedWrite
test_log_scope.go:162: test logs captured to: /artifacts/tmp/_tmp/8b7adba2afa4d2ee9e34e55bcb54994d/logTestLeaseTransferWithPipelinedWrite1002527624
test_log_scope.go:80: use -show-logs to present logs inline
cmd_lease_test.go:112: worker failed: pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN): "sql txn" meta={id=71e65089 key=/Table/131/1/0/1/0 pri=0.01318320 epo=0 ts=1660203131.418290275,1 min=1660203131.251080030,0 seq=3} lock=true stat=PENDING rts=1660203131.418290275,1 wto=false gul=1660203131.751080030,0
panic.go:500: -- test log scope end --
--- FAIL: TestLeaseTransferWithPipelinedWrite (56.69s)
Parameters: TAGS=bazel,gss
kv/kvserver/batcheval.TestLeaseTransferWithPipelinedWrite failed with artifacts on master @ 6b7f0058325de8b8b321a08137d462c418c3047f:
=== RUN TestLeaseTransferWithPipelinedWrite
test_log_scope.go:162: test logs captured to: /artifacts/tmp/_tmp/8b7adba2afa4d2ee9e34e55bcb54994d/logTestLeaseTransferWithPipelinedWrite2716334196
test_log_scope.go:80: use -show-logs to present logs inline
cmd_lease_test.go:112: worker failed: pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN): "sql txn" meta={id=30cd482c key=/Table/133/1/0/1/0 pri=0.01267302 epo=0 ts=1660548155.641867712,1 min=1660548155.504547373,0 seq=3} lock=true stat=PENDING rts=1660548155.641867712,1 wto=false gul=1660548156.004547373,0
panic.go:500: -- test log scope end --
--- FAIL: TestLeaseTransferWithPipelinedWrite (67.94s)
Parameters: TAGS=bazel,gss
This error looks benign enough but this test isn't equipped to handle it. I'm not sure why this started firing recently however.
https://github.com/cockroachdb/cockroach/blob/06cc644939f84fcd42914cf7e0f16f7e7076fcee/pkg/kv/kvserver/replica_tscache.go#L561-L565
Quick stress runs locally, running latest master, for a few minutes didn't trip things up. Perhaps worth trying more seriously on gceworker/etc, but only to figure out what's changed recently. This error seems fine, and perhaps the test should be adjusted to handle it. @nvanbenschoten, leaving it on your plate in case if you have any ideas. Feel free to re-assign etc.
I've also stressed this for about an hour on a 20-node roachprod cluster (make roachprod-stress) and locally with bazel (dev test --stress) without a failure. The failure certainly seems new, but I'm not aware of any change that would have caused it.
However, it is strange to me that this transaction would fail to refresh. It hasn't performed any reads, so why is the retry error getting all the way back to the client?
However, it is strange to me that this transaction would fail to refresh. It hasn't performed any reads, so why is the retry error getting all the way back to the client?
This isn't the right question. The ABORT_REASON_NEW_LEASE_PREVENTS_TXN error path will result in a TransactionAbortedError, which can not be refreshed and must result in a transaction retry.
This test is not using a txn retry loop, so the error is expected.
The question then again becomes: why are we running into a new lease? This is a 3 node cluster, so perhaps there was a lease transfer? Did something recently change in this area?
I can reproduce this if I bump up the internal iteration count to 300: https://github.com/cockroachdb/cockroach/blob/9c7db33c1ee21145fe372d61ad26b45a9f151ee3/pkg/kv/kvserver/batcheval/cmd_lease_test.go#L52-L53
Reading through the test's history and comments, it's all about lease transfers, so I guess it's not surprising that we're seeing this.
kv/kvserver/batcheval.TestLeaseTransferWithPipelinedWrite failed with artifacts on master @ 003c0360de8b64319b5f0f127b99be91dbdca8a3:
=== RUN TestLeaseTransferWithPipelinedWrite
test_log_scope.go:162: test logs captured to: /artifacts/tmp/_tmp/8b7adba2afa4d2ee9e34e55bcb54994d/logTestLeaseTransferWithPipelinedWrite555953228
test_log_scope.go:80: use -show-logs to present logs inline
cmd_lease_test.go:112: worker failed: pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN): "sql txn" meta={id=66015018 key=/Table/111/1/0/1/0 pri=0.06724410 epo=0 ts=1661324742.116263602,0 min=1661324742.116263602,0 seq=3} lock=true stat=PENDING rts=1661324742.116263602,0 wto=false gul=1661324742.616263602,0
panic.go:500: -- test log scope end --
--- FAIL: TestLeaseTransferWithPipelinedWrite (37.24s)
Parameters: TAGS=bazel,gss