Fix precision loss for large CQL timestamp values
Pre-review checklist
- [x] I have split my patch into logically separate commits.
- [x] All commit messages clearly explain what they change and why.
- [x] I added relevant tests for new features and bug fixes.
- [x] All commits compile, pass static checks and pass test.
- [x] PR description sums up the changes and reasons why they should be introduced.
- [ ] I have provided docstrings for the public items that I want to introduce.
- [ ] I have adjusted the documentation in
./docs/source/.
Description
CQL timestamp values (signed milliseconds from epoch) lose precision when they are further than ~300 years from Unix epoch due to floating-point conversions during serialization/deserialization. This causes issues where a timestamp retrieved from the database cannot be used to query for the same row.
Problem
The driver was converting int64 milliseconds to floating-point values during serialization/deserialization, causing precision loss for timestamps more than ~300 years from the Unix epoch (the maximum number of milliseconds that can be exactly represented in a double).
Example scenario:
- Insert timestamp "2300-01-01 00:00:00.001"
- Retrieve it back: "2300-01-01 00:00:00.000999" (precision lost)
- Use retrieved value in a query: fails because it's serialized as "2300-01-01 00:00:00.000"
Root Causes
- Deserialization divides int64 milliseconds by 1000.0 (float), losing precision
- Serialization uses floating-point arithmetic (
*1e3,/1e3) to calculate milliseconds - Both Python and Cython implementations had this issue
Solution
Implemented integer-only arithmetic throughout the timestamp conversion path:
-
Added new conversion functions:
-
datetime_from_timestamp_ms()incassandra/util.py(Python path) -
datetime_from_timestamp_ms()incassandra/cython_utils.pyx(Cython path) - Both functions use integer arithmetic exclusively to avoid precision loss
-
Note: The existing
datetime_from_timestamp()function is preserved as it's still used by UUID utilities (datetime_from_uuid1()) and theDateclass for seconds-based conversions
-
-
Updated DateType serialization/deserialization:
-
cassandra/cqltypes.py: ChangedDateType.deserialize()to call the new function directly on milliseconds (no division) -
cassandra/cqltypes.py: ChangedDateType.serialize()to use integer operations (*1000,//1000) instead of float operations (*1e3,/1e3) -
cassandra/deserializers.pyx: Updated Cython deserializer to use the new integer-based function and removed unused import
-
-
Code cleanup:
- Removed unnecessary negative remainder check from
datetime_from_timestamp_ms()- Python's modulo operator always returns non-negative results for positive divisors - Removed unused
datetime_from_timestampimport fromcassandra/deserializers.pyx
- Removed unnecessary negative remainder check from
-
Added comprehensive test coverage:
- Created
tests/unit/test_timestamp_precision.pywith 4 unit tests:- Large timestamp roundtrip (year 2300)
- Various large timestamps including negative values (year 1640 BCE)
- Small timestamps to ensure no regression
- The exact scenario from the issue report
- Added integration test
test_large_timestamp_precision()intests/integration/standard/test_types.pythat verifies round-trip identity for large timestamps (years 2300, 1640, 2500) on a real cluster
- Created
Testing
- ✅ All new unit tests pass (4/4)
- ✅ All existing DateType tests pass
- ✅ All unit type tests pass (62 passed, 1 skipped)
- ✅ Integration test added for real cluster verification
- ✅ No regressions detected
- ✅ Issue scenario validated - timestamps like "2300-01-01 00:00:00.001" now maintain full precision through serialization/deserialization cycles
- ✅ Negative timestamp handling verified to work correctly
The changes are minimal and surgical, affecting only the timestamp conversion logic without altering any other functionality.
Original prompt
This section details on the original issue you should resolve
<issue_title>Large values of CQL type
timestamplose precision due to some floating-point conversions along the way</issue_title> <issue_description>timestampCQL values are integers (signed milliseconds from epoch), but they get converted to double somewhere along the way, so they lose precision if they are further than ~300 years from the Unix epoch. (~300 years is the max number of milliseconds which can be exactly represented in adouble).In particular, you can e.g. select a
timestampkey from a table, then use the obtained key to query the table for that particular key, and the query will return an empty result.Consider:
keyspace_name = "ks" table_name = "ks" session.execute(f"DROP KEYSPACE IF EXISTS {keyspace_name}") session.execute(f""" CREATE KEYSPACE IF NOT EXISTS {keyspace_name} WITH replication = {{'class': 'SimpleStrategy', 'replication_factor': 1}} """) session.execute(f""" CREATE TABLE {keyspace_name}.{table_name} ( pk timestamp PRIMARY KEY ) """) test_timestamp = "2300-01-01 00:00:00.001" insert = session.execute(f"INSERT INTO {keyspace_name}.{table_name}(pk) VALUES('{test_timestamp}')") scan_result = session.execute(f"SELECT * FROM {keyspace_name}.{table_name}") scan_rows = list(scan_result) assert len(scan_rows) == 1 retrieved_timestamp = scan_rows[0].pk logger.debug(f"Original timestamp: {test_timestamp}") logger.debug(f"Retrieved timestamp: {retrieved_timestamp}") key_rows = list(session.execute(f"SELECT * FROM {keyspace_name}.{table_name} WHERE pk = '{test_timestamp}'")) assert len(key_rows) == 1 key_select = session.prepare(f"SELECT * FROM {keyspace_name}.{table_name} WHERE pk = ?") key_rows = list(session.execute(key_select, [retrieved_timestamp])) assert len(key_rows) == 1 # FAIL session.execute(f"DROP KEYSPACE {keyspace_name}")This will give the log output:
2025-09-11 23:12:40,117 - DEBUG - Original timestamp: 2300-01-01 00:00:00.001 2025-09-11 23:12:40,117 - DEBUG - Retrieved timestamp: 2300-01-01 00:00:00.000999And the test will fail on the assert marked with the
FAILcomment, because the actual key is2300-01-01 00:00:00.001, the key converted to Python native types is2300-01-01 00:00:00.001 - epsilon, and the key sent back with the lastSELECTis2300-01-01 00:00:00.000.</issue_description>Comments on the Issue (you are @copilot in this section)
- Fixes scylladb/python-driver#532
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.
@copilot please complete the work on the PR
@copilot please complete the work on the PR
The PR is now complete. All changes have been implemented and tested:
Changes made (commit fbab5cd):
- Added integer-based timestamp conversion functions to avoid floating-point precision loss
- Updated serialization/deserialization in both Python and Cython code paths
- Added comprehensive test suite that validates the fix
Testing results:
- New timestamp precision tests: ✅ All 4 tests pass
- Existing DateType tests: ✅ Pass
- All unit type tests (63 tests): ✅ 62 passed, 1 skipped
- Issue scenario validation: ✅ Timestamps far from epoch now maintain full precision
The fix ensures that timestamps like "2300-01-01 00:00:00.001" can be correctly round-tripped through serialization/deserialization without losing precision.
@copilot Please also add an integration test case in
/tests/integration/standard/test_types.pythat verifies round-trip identity for such types on a real cluster.
Added integration test in commit 38911a3. The test verifies round-trip identity for large timestamps (years 2300, 1640, 2500) on a real cluster, ensuring millisecond precision is maintained through serialization and deserialization.