arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[Python] Timestamp - out of bounds for nanoseconds

Open MarioShuuya opened this issue 1 year ago • 0 comments

Describe the bug, including details regarding any error messages, version, and platform.

Environment OS: Windows/Linux Python: 3.11.2 Pyarrow: 17.0.0 Pandas: 2.2.2

Description When trying to read a timestamp value, below the pandas min. value of 1677-09-21 00:12:43.145224193, from a datetime object into a pyarrow table, the result is an out of bounds for nanoseconds exception.

I have found problems that might relate but did not solve the issue here

Example Code

import pyarrow as pa
import datetime

schema = pa.schema([])
schema = schema.append(pa.field("CreateAt", pa.timestamp(unit="ns")))

ts = datetime.datetime(1677, 9, 21, 1)  # OK
arrays = [[ts]]
print(arrays)
table = pa.Table.from_arrays(arrays, schema=schema)
print(table)

ts = datetime.datetime(1, 1, 1, 1)  # NoK
arrays = [[ts]]
print(arrays)
table = pa.Table.from_arrays(arrays, schema=schema)
print(table)

Use Case I am reading data from a database, where one column has ns precision timestamps. Instead of null values, it uses 0001-01-01 00:00:00.0000000. The goal is to store the result of the database read, which is an array containing datetime objects, into a Pyarrow table to then store it as parquet. This works well, until i hit a timestamp too big or small for pandas.

Component(s)

Python

MarioShuuya avatar Oct 15 '24 15:10 MarioShuuya