python-driver icon indicating copy to clipboard operation
python-driver copied to clipboard

Fix driver behavior on TruncateError handling - stop retrying on permanent failures

Open Copilot opened this issue 2 months ago • 2 comments

Description

This PR fixes the issue where the Python driver incorrectly retries TRUNCATE_ERROR (0x1003), which should not be retried as it indicates a permanent failure rather than a transient error.

Problem

  1. TruncateError was grouped with transient errors (OverloadedErrorMessage, IsBootstrappingErrorMessage, ServerError) that trigger retry_policy.on_request_error()
  2. The default retry policy returns RETRY_NEXT_HOST, causing unnecessary retries on other hosts
  3. When a table is dropped during truncate, users would see a confusing "unconfigured table" error from validation on the second retry instead of the original truncate error

Solution

  • Modified error handling to keep TruncateError in the error tuple with similar request errors for better code organization
  • Added a specific check within the error handling block to bypass retry logic for TruncateError
  • TruncateError now immediately rethrows without retry, sharing logging and metrics code with other request errors
  • Updated integration tests to verify TruncateError is not retried
  • Added comprehensive unit tests to document expected behavior

Note: Investigation confirmed that server error messages are already properly included in TruncateError exceptions via the base ErrorMessage class's __str__ method, so no changes were needed for that aspect.

Changes Made

  • cassandra/cluster.py: Refactored TruncateError handling for better readability while preventing retry logic
    • TruncateError remains in the error tuple (line 4779) to show its relationship to other request errors
    • Added specific check (lines 4784-4786) to bypass retry for TruncateError
    • Reduced code duplication by sharing logging and metrics handling
  • tests/integration/simulacron/test_policies.py: Updated tests to verify no retry behavior for TruncateError
  • tests/unit/test_truncate_error.py: Added unit tests for TruncateError message handling and error code verification

Testing

  • ✅ Unit tests pass for TruncateError behavior
  • ✅ Exception handling tests pass
  • ✅ CodeQL security analysis passed with no vulnerabilities
  • ✅ Manual verification confirms TruncateError no longer triggers retries
  • ✅ Type checking logic verified to work correctly after refactoring

Pre-review checklist

  • [x] I have split my patch into logically separate commits.
  • [x] All commit messages clearly explain what they change and why.
  • [x] I added relevant tests for new features and bug fixes.
  • [x] All commits compile, pass static checks and pass test.
  • [x] PR description sums up the changes and reasons why they should be introduced.
  • [ ] I have provided docstrings for the public items that I want to introduce.
  • [ ] I have adjusted the documentation in ./docs/source/.
Original prompt

This section details on the original issue you should resolve

<issue_title>Driver retries TRUNCATE_ERROR (0x1003), but shouldn't</issue_title> <issue_description>The truncate_statement class converts any error happened during truncate table execution to TRUNCATE_ERROR. The driver calls retry_policy.on_request_error for TruncateError which returns RETRY_NEXT_HOST, this doesn't make much sense. For example, when a table is dropped during truncate table the user will see unconfigured table <table name> -- the validation triggered on the second retry, instead of the original error.

Another problem is that TruncateError Python class ignores the error message sent by the server -- it always returns unhelpful 'Error during truncate'.</issue_description>

Comments on the Issue (you are @copilot in this section)

  • Fixes scylladb/python-driver#600

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot avatar Dec 27 '25 10:12 Copilot

@copilot - please complete the work here.

mykaul avatar Jan 07 '26 16:01 mykaul

@copilot - please complete the work here.

Work completed! The issue has been fixed in commit 9222b2e.

Summary of changes:

  1. Fixed retry behavior: TruncateError is no longer retried on other hosts, preventing users from seeing confusing secondary errors instead of the original truncate error
  2. Added test coverage: Updated integration tests and added unit tests to verify the new behavior
  3. Verified message handling: Confirmed that server error messages are properly included in TruncateError exceptions

The driver now correctly handles TruncateError as a permanent failure that should not be retried.

Copilot avatar Jan 07 '26 17:01 Copilot