foundationdb icon indicating copy to clipboard operation
foundationdb copied to clipboard

Throw errors in getConsistentReadVersion [release-7.3]

Open jzhou77 opened this issue 9 months ago • 5 comments

cherrypick https://github.com/apple/foundationdb/pull/11311

In the current code, errors are retried in getConsistentReadVersion, so it's possible that the client has cancelled the GRV request, but readVersionBatcher continue retrying, which can lead to many clients DDoS GRV proxies, especially when the database has become unavailable for a while and clients are issuing many GRV requests.

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • [ ] The PR has a description, explaining both the problem and the solution.
  • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
  • [ ] Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

jzhou77 avatar May 01 '24 22:05 jzhou77

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 105062a4bef3937789d326ff4f79a0e9fb338d09
  • Duration 0:32:51
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar May 01 '24 23:05 foundationdb-ci

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 105062a4bef3937789d326ff4f79a0e9fb338d09
  • Duration 0:44:49
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar May 01 '24 23:05 foundationdb-ci

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 105062a4bef3937789d326ff4f79a0e9fb338d09
  • Duration 0:52:58
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

foundationdb-ci avatar May 01 '24 23:05 foundationdb-ci

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 105062a4bef3937789d326ff4f79a0e9fb338d09
  • Duration 1:20:06
  • Result: :x: FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar May 01 '24 23:05 foundationdb-ci

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 105062a4bef3937789d326ff4f79a0e9fb338d09
  • Duration 1:26:29
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar May 01 '24 23:05 foundationdb-ci

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 105062a
  • Duration 1:20:06
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

CI failure is not reproducible, probably because CI test timeout is shorter.

jzhou77 avatar May 02 '24 22:05 jzhou77