cpython icon indicating copy to clipboard operation
cpython copied to clipboard

[3.13] copyreg._reconstructor failures for protocol 0 and 1 for `dateutil.tz.tzutc`

Open moorchegue opened this issue 7 months ago • 9 comments

Bug description:

In [25]: pickle.load(open('taskwarrior/f91b9e98-7586-4317-ae52-b516e97209e5', 'rb'))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[25], line 1
----> 1 pickle.load(open('taskwarrior/f91b9e98-7586-4317-ae52-b516e97209e5', 'rb'))

File /usr/lib/python3.13/copyreg.py:44, in _reconstructor(cls, base, state)
     41 def _reconstructor(cls, base, state):
     42     if base is object:
     43         #obj = cls.__new__(cls)
---> 44         obj = object.__new__(cls)
     45     else:
     46         obj = base.__new__(cls, state)

TypeError: object.__new__(tzutc) is not safe, use tzutc.__new__()

It seems that cls.__new__(cls) is the correct way to achieve the desired effect for any type of object?

This issue was not present in 3.12, or at least same code on same pickles used to work.

I was not able to reproduce this with arbitrary data serialization/deserialization (e.g. pickle.dump(tzutc(), f); pickle.load(f) works), which suggests this is dependent on how exactly data is structured, and so I'm including the culprit pickle here: 74b9da0a-ab54-437f-9ab5-549e886d3c83.txt

CPython versions tested on:

3.13

Operating systems tested on:

Linux

moorchegue avatar Jun 06 '25 10:06 moorchegue

Hi, I suspect this is related to a library you're using. The repro fails with this:

ModuleNotFoundError: No module named 'taskw_ng'

I doubt copyreg._reconstructor is the culprit, because that hasn't been touched in over two decades.

ZeroIntensity avatar Jun 06 '25 10:06 ZeroIntensity

Yeah, it is: taskw-ng==0.2.7. The context here is the https://github.com/bergercookie/syncall (data sync between Asana and taskwarrior) after upgrading from 3.12 to 3.13.

I fixed it at the level of copyreg, finding information that this way of using the __new__ method was deprecated long time ago. Which is the reason I'm suggesting to change this upstream regardless. But perhaps it's the serialization that's the issue (as well).

With the above fix the result of a pickle.load() is this:

{'id': 0,
 'description': 'Set up a demo environment',
 'end': datetime.datetime(2024, 5, 28, 6, 15, 10, tzinfo=tzutc()),
 'entry': datetime.datetime(2024, 5, 28, 6, 13, 7, tzinfo=tzutc()),
 'modified': datetime.datetime(2024, 5, 28, 6, 15, 11, tzinfo=tzutc()),
 'project': 'integral/vxbos',
 'status': 'completed',
 'uuid': UUID('74b9da0a-ab54-437f-9ab5-549e886d3c83'),
 'urgency': 1.25205}

Strangely enough it doesn't contain anything outside of the standard library.

I'm looking at a diff between the culprit file and a dump of the above data, and what immediately stands out is that _reconstructor is mentioned right on top of the file:

diff -u 74b9da0a-ab54-437f-9ab5-549e886d3c83.txt tmp.pickle | head
--- 74b9da0a-ab54-437f-9ab5-549e886d3c83.txt    2025-06-06 18:06:37.516240292 +0800
+++ tmp.pickle  2025-06-06 18:44:29.165722062 +0800
@@ -1,445 +1,85 @@
-ccopy_reg
-_reconstructor
-p0
-(ctaskw_ng.task
-Task
+(dp0
+Vdescription

Update: scratch that, _reconstructor is mentioned later in the second file as well, so that's not the issue, probably the exact chain of calls is.

I don't fully understand what's going on, but it seems that some intermediate function calls are included into the pickle, and that's why it fails with this file, and doesn't with resulting data structure.

moorchegue avatar Jun 06 '25 11:06 moorchegue

I fixed it at the level of copyreg, finding information that this way of using the new method was deprecated long time ago.

It's a Python 2 compatibility shield, as far as I can tell. But I can't see why that would suddenly break things in 3.13.

ZeroIntensity avatar Jun 06 '25 11:06 ZeroIntensity

@ZeroIntensity, do you think pickle should load the attached file without error?

moorchegue avatar Jun 06 '25 12:06 moorchegue

No idea, it's impossible to tell without seeing the code. It's most likely an issue with taskw_ng. I'd report it to them first, and then if they determine it's an issue on our end, they'll file a new report with a pure-stdlib repro.

ZeroIntensity avatar Jun 06 '25 13:06 ZeroIntensity

Problem with that is taskw_ng doesn't cause this error. Furthermore it's not even in the stacktrace:

Exception was raised during program execution.

  File "~/.virtualenvs/tw/bin/tw_asana_sync", line 8, in <module>
    sys.exit(main())
             ~~~~^^
  File "~/.virtualenvs/tw/lib/python3.13/site-packages/click/core.py", line 1442, in __call__
    return self.main(*args, **kwargs)
           ~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "~/.virtualenvs/tw/lib/python3.13/site-packages/click/core.py", line 1363, in main
    rv = self.invoke(ctx)
  File "~/.virtualenvs/tw/lib/python3.13/site-packages/click/core.py", line 1226, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.virtualenvs/tw/lib/python3.13/site-packages/click/core.py", line 794, in invoke
    return callback(*args, **kwargs)
  File "~/.virtualenvs/tw/lib/python3.13/site-packages/syncall/scripts/tw_asana_sync.py", line 233, in main
    aggregator.sync()
    ~~~~~~~~~~~~~~~^^
  File "~/.virtualenvs/tw/lib/python3.13/site-packages/syncall/aggregator.py", line 178, in sync
    changes_A = self.detect_changes(self._helper_A, items_A)
  File "~/.virtualenvs/tw/lib/python3.13/site-packages/syncall/aggregator.py", line 159, in detect_changes
    cached_item = pickle_load(serdes_dir / item_id)
  File "~/.virtualenvs/tw/lib/python3.13/site-packages/bubop/serial.py", line 20, in pickle_load
    return pickle.load(f)
           ~~~~~~~~~~~^^^
  File "/usr/lib/python3.13/copyreg.py", line 43, in _reconstructor
    obj = object.__new__(cls)


object.__new__(tzutc) is not safe, use tzutc.__new__()

And I'm not sure at what point and with what version of that library was this pickle file created, but I am sure that it wasn't causing an error until the upgrade. I could report it to syncall or bubop, but looking at their code I don't see any misuse either. What such misuse could possibly even look like?

At the same time calling object.__new__(tzutc) does seem to cause an issue in both 3.13 and 2.7. Admittedly I don't understand why that should throw an error while tzutc.__new__(tzutc) shouldn't.

moorchegue avatar Jun 06 '25 13:06 moorchegue

Can you send the code that is creating the file?

ZeroIntensity avatar Jun 06 '25 13:06 ZeroIntensity

What is tzutc?

It looks like in version that created the pickle data, tzutc was implemented in Python, so its __new__ was inherited from object, and object.__new__(tzutc) created a "raw" instance of tzutc, with empty __dict__. In version that loads the pickle data, tzutc is an extension type, so object.__new__(tzutc) no longer works, because it doesn't know how to initialize internal state of tzutc.

Look what versions of the library that provides tzutc was at pickle and unpickle time.

This is an interesting problem. It cannot be solved at the user side (implementing __reduce__ or __reduce_ex__ for tzutc can help with future pickles, but not with unpickling old pickle data). Fixing it at Python side requires additional investigation if we don't want to break other user code.

This all happens only with historical pickle protocols 0 and 1. The best chance to recover your data is to use an older version of the library to unpickle it and then pickle it using a newer pickle protocol.

serhiy-storchaka avatar Jun 06 '25 17:06 serhiy-storchaka

Can you send the code that is creating the file?

I think it's the code where pickle_dump is called from here: https://github.com/bergercookie/syncall/blob/master/syncall/aggregator.py

What is tzutc?

It's dateutil.tz.tzutc, which inherits datetime.tzinfo.

Thanks for breaking it down, @serhiy-storchaka, this roughly matches my timeline with these files and Python upgrades. Let me know if I can do more to investigate it!

moorchegue avatar Jun 07 '25 18:06 moorchegue