warcio icon indicating copy to clipboard operation
warcio copied to clipboard

DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version

Open benoit74 opened this issue 1 year ago • 7 comments

Since Python 3.12, we have the following DeprecationWarning:

warcio/recordbuilder.py:156: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
    return datetime_to_iso_date(datetime.datetime.utcnow(), use_micros=use_micros)

Pretty easy to fix in 3.12, but maybe not that easy in reality since you probably wanna maintain 2.7+ and 3.4+ support. I suggest to not use what the DeprecationWarning suggests but datetime.datetime.now(tzinfo = datetime.tzinfo.UTC) which should be OK (to be checked).

benoit74 avatar Mar 01 '24 13:03 benoit74

I tried to work on this feature by first adding support for Python 3.12 but I fail to get the tests running with it.

First problem is in test_capture_http_proxy.py:

  • werkzeug dependency must be pinned to 2.0.3 for working with httpbin==0.5.0
  • setup(cls) should be replaced by setup_class(cls)
  • requests complains that proxies set are not a valid URL (we should probably add the "http://" prefix manually)
  • but even after that the HTTPS tests are still failing
  • upgrading to latest httpbin / werkzeug version does not help

I don't know how to fix this situation

benoit74 avatar May 22 '24 13:05 benoit74

Tessa @tw4l is working on getting rid of the httpbin version dependency in PR 153 https://github.com/webrecorder/warcio/pull/153 -- and she's setting up Github Actions so we'll have CI again. Once that's done you'll easily be able to finish this one.

wumpus avatar May 23 '24 21:05 wumpus

Oh great, thank you! I'm glad I stopped before investing too much time in this ^^

benoit74 avatar May 24 '24 07:05 benoit74

I'm struggling with the same HTTPS proxy issue as you document above, but hopefully will work it out soon!

tw4l avatar May 24 '24 19:05 tw4l

Good luck! (I have my own share of struggling, I know what this is ^^)

benoit74 avatar May 24 '24 19:05 benoit74

Turns out pinning urllib3 to an older version for now resolves it! PR to switch to GitHub Actions CI is now open :) https://github.com/webrecorder/warcio/pull/164

tw4l avatar May 24 '24 21:05 tw4l

datetime.datetime.now(tzinfo = datetime.tzinfo.UTC) which should be OK (to be checked).

Nope, it will trigger exceptions while doing math with deltatime for code that was relying on the previous (current) behaviour of return naïve datetime, as in warcio.timeutils.timestamp_to_datetime.

Got bitten by it on a custom Filter I wrote on pywb.

My current proposal is to revise all warcio functions returning a datetime and do something like:

def timestamp_to_datetime(string, tzinfo:datetime.timezone=None) -> datetime.datetime:
#                                ^^^^^^^ HERE ! ^^^^^^^^^^^^^^^^^
	#
	# yada yada yada
	#
	return datetime.datetime(year=year,
                                                         month=month,
                                                         day=day,
                                                         hour=hour,
                                                         minute=minute,
                                                         second=second,
                                                         tzinfo=tzinfo) # <---- HERE!!!!

This way, all current code will still get the expected behaviour, but users willing to cope with >= 3.12 can just add datetime.timezone.utc to the call and get an aware datetime instead.

Eventually the naive datetime will be fully deprecated in some future Python version. When this happens, changing the function signature to

def timestamp_to_datetime(string, tzinfo:datetime.timezone=datetime.timezone.utc) -> datetime.datetime:

Will "automagically" convert all callers to aware datetime. Code that already coped with the deprecation will not be affected, and code that will break will break in the client's land where the fix will be obvious, and not with some Exception inside warcio that will prompt the client to seek for support.

Lisias avatar Aug 27 '24 20:08 Lisias

My current proposal is to revise all warcio functions returning a datetime and do something like:

def timestamp_to_datetime(string, tzinfo:datetime.timezone=None) -> datetime.datetime:
#                                ^^^^^^^ HERE ! ^^^^^^^^^^^^^^^^^
	#
	# yada yada yada
	#
	return datetime.datetime(year=year,
                                                         month=month,
                                                         day=day,
                                                         hour=hour,
                                                         minute=minute,
                                                         second=second,
                                                         tzinfo=tzinfo) # <---- HERE!!!!

Thanks for the suggestion, @Lisias! We went with something similar in the end - a tz_aware arg on the datetime utility functions that is set false by default but can be set true to generate a timezone-aware datetime with tzinfo=datetime.timezone.utc. Since WARC requires dates be in UTC, this felt less error-prone that allowing users to specify any timezone.

tw4l avatar Nov 12 '24 17:11 tw4l