python-cx_Oracle icon indicating copy to clipboard operation
python-cx_Oracle copied to clipboard

cx_Oracle does not correctly convert TIMESTAMP WITH TIMEZONE

Open dlenski opened this issue 7 years ago • 15 comments

cx_Oracle 5.3 does not correctly handle TIMESTAMP WITH TIMEZONE

When a TIMESTAMP WITH TIMEZONE is returned from a query and converted to a Python datetime object, its time zone information is simply truncated.

Simple demonstration, where current_timestamp and systimestamp are in different time zones, and the Python objects returned for them are totally lacking in time zone information:

>>> import cx_Oracle
>>> conn = cx_Oracle.connect( user, pass, dns )
>>> print(conn.version)
12.1.0.2.0
>>> cur = conn.cursor()
>>> print( cur.execute(''' select to_char(current_timestamp), current_timestamp,
                                  to_char(systimestamp), systimestamp from dual  ''').fetchone() )
('19-APR-17 04.38.04.466727 PM -07:00',
 datetime.datetime(2017, 4, 19, 16, 38, 4, 466727),
 '20-APR-17 01.38.04.466724 AM +02:00',
 datetime.datetime(2017, 4, 20, 1, 38, 4, 466724))

This bug is observed by others as well: http://stackoverflow.com/questions/6125698/what-is-the-correct-way-to-retrieve-timezone-aware-datetime-objects-from-a-cx-or#comment7109898_6125840

The conversion in the other direction is broken as well.

When marshalling a Python timezone-aware datetime object into a bind variable… the bind variable ends up as a naïve Oracle DATE rather than a TIMESTAMP WITH TIME ZONE:

>>> from datetime import datetime, timezone, timedelta
>>> tzdate = datetime.now(tz=timezone(timedelta(hours=-1)))
>>> print(tzdate.isoformat())
'2017-04-19T22:43:53.189561-01:00'

>>> print( cur.execute("select to_char(:tzdate,'YYYY-MM-DD\"T\"HH24:MM:SS') from dual",
           {'tzdate':tzdate}).fetchone() )
('2017-04-19T22:04:53',)

>>> cur.execute("select extract(timezone_hour from :tzdate) from dual", {'tzdate':tzdate}).fetchone()
DatabaseError: ORA-30076: invalid extract field for extract source

dlenski avatar Apr 19 '17 23:04 dlenski

No attempt has currently been made to transfer timezone information either to or from the database. This is primarily due to the fact that there is still no method in the C API to acquire or set the tzinfo attribute on datetime instances. That would imply the need to acquire the attribute using the generic C API as well as create datetime instances with the generic C API which will eliminate the benefits of using the datetime C API in the first place. I'll do some checking to see if there is a solution to this that doesn't involve a significant reduction in performance.

anthony-tuininga avatar Apr 20 '17 17:04 anthony-tuininga

No attempt has currently been made to transfer timezone information either to or from the database. This is primarily due to the fact that there is still no method in the C API to acquire or set the tzinfo attribute on datetime instances. That would imply the need to acquire the attribute using the generic C API as well as create datetime instances with the generic C API which will eliminate the benefits of using the datetime C API in the first place.

Argghh, I see what you mean. Python datetime C API has no mechanism to manipulate tzinfo. :frowning_face:. I guess that's because the tzinfo objects are typically "heavyweight" Python objects, rather than something that can be easily manipulated from C.

I'll do some checking to see if there is a solution to this that doesn't involve a significant reduction in performance.

Perhaps you could create datetime instances using the C API and then use the generic API to call dt.replace(tzinfo=...) to add the time zone? I'm not sure if this is actually any faster.

dlenski avatar Apr 20 '17 18:04 dlenski

Drat. A quick test suggests that using .replace on a pre-existing datetime object is actually slower than creating a new datetime object using the generic API.

In [20]: %%timeit -n1000000 tz=timezone(timedelta(hours=-1)) 
    ...: # Create a new datetime object using Python API
    ...: now=datetime(2017,4,20,12,0,0,0,tz)
    ...: 
1000000 loops, best of 3: 237 ns per loop

In [21]: %%timeit -n1000000 now=datetime.now(); tz=timezone(timedelta(hours=-1)) 
    ...: # Replace the tzinfo attribute (and maybe garbage-collect the old 'now' object)
    ...: now=now.replace(tzinfo=tz)
    ...: 
1000000 loops, best of 3: 1.33 µs per loop

dlenski avatar Apr 20 '17 18:04 dlenski

I just ran my own set of tests using the existing C API and then a second set calling the datetime.datetime() constructor instead (using PyObject_Call). This is for calling var.getvalue() on a datetime variable created used the cursor.var() method. The results are not good. The cost of creating the tuple and invoking the call is 4.5x greater than using the existing C API (124 ns for one and 558 ns for the other). It looks like the only viable approach is to request that the tzinfo attribute be made available in the Python core, or to find some way to work around the deficiency. Thoughts?

anthony-tuininga avatar Apr 20 '17 21:04 anthony-tuininga

I just ran my own set of tests using the existing C API and then a second set calling the datetime.datetime() constructor instead (using PyObject_Call). This is for calling var.getvalue() on a datetime variable created used the cursor.var() method. The results are not good. The cost of creating the tuple and invoking the call is 4.5x greater than using the existing C API (124 ns for one and 558 ns for the other).

First off, thank you for digging into this, I really appreciate it!

It looks like the only viable approach is to request that the tzinfo attribute be made available in the Python core…

Yeah, that seems like a good idea to request it.

Even if it were possible to get/set the tzinfo from the C API, I can think of a couple further issues:

  • You would still have to create tzinfo objects using the Python API. (Presumably, most result sets don't include an infinite number of time zones, so you could probably cache one Python tzinfo object for each Oracle TIMEZONE and get a significant speedup that way.)

  • There are (at least) 2 kinds of Oracle time zones that I am aware of, along with 2 corresponding kinds of Python time zone objects:

    1. "Naive" time zones with a fixed hour/minute offset. I think it should be straightforward to add a C API to create one of these, similar to PyDelta_FromDSU.
    with naive_timestamp as (select current_timestamp ts from dual)
    select extract(timezone_hour from ts), extract(timezone_minute from ts), extract(timezone_region from ts), extract(timezone_abbr from ts) from naive_timestamp;
    → returns -7, 0, 'UNKNOWN', 'UNK' for me
    
    from datetime import datetime, timedelta, timezone
    tzinfo = timezone(timedelta(hours=-7, minutes=0))
    print( repr( datetime.now().replace(tzinfo=tzinfo)) )
    → datetime.datetime(2017, 4, 20, 15, 40, 13, 336254, tzinfo=datetime.timezone(datetime.timedelta(-1, 61200)))
    
    1. "Fancy" time zone based on the TZ database, which keep track of daylight savings changes and all that mess. In Python, these are implemented in pytz, which isn't part of the standard library… so they'd need to be created with the generic API.
    with fancy_timestamp as (select current_timestamp at time zone 'America/Los_Angeles' ts from dual)
    select extract(timezone_hour from ts), extract(timezone_minute from ts), extract(timezone_region from ts), extract(timezone_abbr from ts) from naive_timestamp;
    → returns -7, 0, 'America/Los_Angeles', 'PDT' for me
    
    from datetime import datetime
    from pytz import timezone
    tzinfo = timezone('America/Los_Angeles')
    print( repr(datetime.now().replace(tzinfo=tzinfo)) )
    → datetime.datetime(2017, 4, 20, 15, 40, 51, 283723, tzinfo=<DstTzInfo 'America/Los_Angeles' ...>)
    

Fortunately, both Oracle and Python use the same standard TZ database naming for the "fancy" timezones, so I think it should be relatively straightforward to convert between them.

dlenski avatar Apr 20 '17 22:04 dlenski

Yeah, that seems like a good idea to request it.

I've added a Python enhancement to track it.

You would still have to create tzinfo objects using the Python API. (Presumably, most result sets don't include an infinite number of time zones, so you could probably cache one Python tzinfo object for each Oracle TIMEZONE and get a significant speedup that way.)

Yes, a cache of tzinfo objects would definitely work so the lack of a C API for creating tzinfo objects is not as much of a concern.

"Naive" time zones with a fixed hour/minute offset. I think it should be straightforward to add a C API to create one of these, similar to PyDelta_FromDSU.

These are straightforward as the hour and minute offset are readily available.

"Fancy" time zone based on the TZ database, which keep track of daylight savings changes and all that mess. In Python, these are implemented in pytz, which isn't part of the standard library… so they'd need to be created with the generic API.

I'd hesitate to use these since they aren't part of the standard library. Perhaps a tzinfo class could be created by cx_Oracle which uses the time zone name and offsets available from OCI?

anthony-tuininga avatar Apr 24 '17 15:04 anthony-tuininga

I've added a Python enhancement to track it.

Excellent.

"Fancy" time zone based on the TZ database, which keep track of daylight savings changes and all that mess. In Python, these are implemented in pytz, which isn't part of the standard library… so they'd need to be created with the generic API.

I'd hesitate to use these since they aren't part of the standard library. Perhaps a tzinfo class could be created by cx_Oracle which uses the time zone name and offsets available from OCI?

Yeah, that seems like a good idea. Actually, the "naive" time zone (datetime.timezone) has an optional name parameter (timezone(timedelta(hours=X, minutes=Y), name='ABC')) so I think it could be used as-is.

dlenski avatar Apr 24 '17 16:04 dlenski

"Fancy" time zone based on the TZ database, which keep track of daylight savings changes and all that mess. In Python, these are implemented in pytz, which isn't part of the standard library… so they'd need to be created with the generic API.

I'd hesitate to use these since they aren't part of the standard library. Perhaps a tzinfo class could be created by cx_Oracle which uses the time zone name and offsets available from OCI?

Yeah, that seems like a good idea. Actually, the "naive" time zone (datetime.timezone) has an optional name parameter (timezone(timedelta(hours=X, minutes=Y), name='ABC')) so I think it could be used as-is.

That seems reasonable. Thanks for pointing that out.

anthony-tuininga avatar Apr 24 '17 16:04 anthony-tuininga

Any progress here, My solution was to convert everything into UTC and add a timezone column for converting back. Not very elegant.

fenchu avatar Jun 25 '19 12:06 fenchu

No progress as I am waiting on this Python enhancement and nothing has been happening on that front, unfortunately. It would be possible to use the straightforward Python API, but there would be significant overhead in doing so, so I have avoided doing so for now.

anthony-tuininga avatar Jun 25 '19 14:06 anthony-tuininga

The Python enhancement has been merged to master. Note that there is one part of the fix that can be made in cx_Oracle even without the change:

An official constructor for a timezone-aware datetime, which I think basically exists in the form of PyDatetime_CAPI->PyDateTimeAPI->DateTime_FromDateAndTime / ->DateTime_FromDateAndTimeAndFold, and we just need to document it. I think this is basically a separate issue, and I have opened #39604 to track it.

sebukoleth avatar Sep 23 '20 20:09 sebukoleth

Yes, I saw that. That's great! The interesting part is that I need to support older versions of Python as well for some time to come. Once I get a copy of 3.10 with the enhancement in place I'll see about implementing the necessary changes in cx_Oracle.

anthony-tuininga avatar Sep 23 '20 20:09 anthony-tuininga

Glad to hear there's some progress on the Python API side now. :+1: :+1:

dlenski avatar Sep 23 '20 21:09 dlenski

Indeed!

anthony-tuininga avatar Sep 23 '20 21:09 anthony-tuininga

In the meantime…

Any progress here, My solution was to convert everything into UTC and add a timezone column for converting back. Not very elegant.

The best solutions I've found (combining human-readable, standards-based, and "no corner cases") are to convert timestamps to ISO8601-format string representations before transferring across the Python-to-Oracle boundary. I've basically used versions of these formulas in all of my cx_Oracle code.

Extracting a TIMESTAMP WITH TIMEZONE column in a cursor

from datetime import datetime

# datetime.strptime("2020-01-01T23:01:01.001-0100", "%Y-%m-%dT%H:%M:%S.%f%z")
# returns => datetime.datetime(2020, 1, 1, 23, 1, 1, 1000, tzinfo=datetime.timezone(datetime.timedelta(-1, 82800)))

cur.execute(''' select to_char(tz_column,'YYYY-MM-DD"T"HH24:MI:SS.FFTZHTZM')) as tz_column_ ''')
for row in cur:
  tz_column = datetime.strptime(row[0], "%Y-%m-%d %H:%M:%S.%f %z")

Injecting a TIMESTAMP WITH TIMEZONE as a bind variable

bindvar = datetime_object.isoformat()
cur.execute(''' select * from table
                where tz_column_ > TO_TIMESTAMP_TZ(:0', 'YYYY-MM-DD"T"HH24:MI:SS.FFTZH:TZM') ''',
            ( bindvar, ))

dlenski avatar Sep 23 '20 21:09 dlenski