Isolate the _datetime extension module
Feature or enhancement
Proposal:
I hope this issue will complete _datetime isolation.
My concerns (and answers)
-
Py_MOD_PER_INTERPRETER_GIL_SUPPORTEDshould be applied in sync with_zoninfo?- YES: Possible.
-
Can a module state have a C-API structure, keeping a capsule just for comatibility?
- NO: Possible, but a user should not touch the module state.
-
C-API supports only the main interpreter? Otherwise,
PyInterpreterStateis acceptable to point each structure?- NO:
PyDateTimeAPIcannot emit an error. Also, noPyInterpreterStatemember is accessible fromdatetime.h. UPDATE: Seems to be possible by using a global function pointer instead of a function.
- NO:
Specific issue:
- #118608
Links to previous discussion of this feature:
- #103092
- #102995
- #110475
Linked PRs
- gh-117399
- gh-117413
- gh-117498
- gh-118115
- gh-118337
- gh-118357
- gh-118606
PR #117399 fails except Windows. Is there an easy way to access a PyInterpreterState member? I'll try later allowing C-API only for the main-interpreter with a global variable.
I briefly discussed the C API capsule with @pganssle on the 2023 (or was it 2022?) language summit. I also asked the Steering Council to comment about backwards compatibility concerns regarding datetime.h (which is not included by Python.h):
- https://github.com/python/steering-council/issues/208
Putting it in the PyInterpreterState struct is an option, but I'm not sure how much we want to bloat that struct. We probably want to deprecate the current capsulated API and introduce a new capsulated API.
Converting the _datetime extension to multiphase initialization and/or converting static types to heap types is complicated because:
<Include/datetime.h>C API- datetime capsule (C API)
- If the _datetime extension is reloaded, the C API expects to meet the same types. Otherwise,
PyDateTime_Check()fails and everything else fails.
It reminds me the very complicated case of the PyAST C API. We managed to convert the _ast extension to heap types and multiphase init by moving state to the interpreter state. Other "trade offs" attemps for the _ast extension ended by introducing crashes which were hard to trigger and hard to fix.
So I propose this plan to isolate the datetime module:
- Move references to types to datetime_state structure: I wrote a minimum PR gh-118606 for that
- Convert static types to heap types, but create them only once, and don't delete them (on purpose, for now).
- Move datetime_state to PyInterpreterState.
- At the point, discuss how to deal with remaining issues: multiphase init, C API, capsule, etc.
Where can we see your rationale for the inflating PyInterpreterState with PyAST C-API? IIUC, the practice is supposed to be bad: https://github.com/python/cpython/issues/103092#issuecomment-2026952543.
How do you associate the PyAST issue with PyDateTime in terms of the mechanizm and impact?
@encukou, @ericsnowcurrently
How do you associate the PyAST issue with PyDateTime in terms of the mechanizm and impact?
Both provide a C API and we wanted to isolate their C extension.
Where can we see your rationale for the inflating PyInterpreterState with PyAST C-API?
I will try to dig into the bug tracker later. In short, there were 2-3 crashes related to the isolation of the _ast extension. Crashes related to the C API.
Python-ast.c (3.10.0 alpha 0: b1cc6ba73a51d5cc3aeb113b5e7378fb50a0e20a)
int PyAST_Check(PyObject* obj)
{
astmodulestate *state = get_global_ast_state();
if (state == NULL) {
return -1;
}
return PyObject_IsInstance(obj, state->AST_type);
}
Previously, PyAST_Check() imported the _ast module through get_global_ast_state(). The implicit import could get an unsafe pseudo module by replacing __import__() with a lazy import (strongly discouraged since Python 3.3).
- https://bugzilla.redhat.com/show_bug.cgi?id=1871992#c1
- https://github.com/python/cpython/pull/21961
- https://github.com/python/cpython/pull/23024
datetime.h
#define PyDate_Check(op) PyObject_TypeCheck((op), PyDateTimeAPI->DateType)
#define PyDate_CheckExact(op) Py_IS_TYPE((op), PyDateTimeAPI->DateType)
PyDate_Check() requires the real module to be imported in advance by calling PyCapsule_Import() explicitly.
At least, the regression case in PyAST C-API above cannot be applied to PyDateTime C-API, I think.
The purpose of this issue is to convert the _datetime extension to the multiphase C API. The problem is to make sure that we are dealing with the same types if multiple instances of the extension are created.
Example:
import sys
import _datetime
now = _datetime.datetime.now()
del _datetime
del sys.modules['_datetime']
import _datetime
print(isinstance(now, _datetime.datetime))
In general, we don't require this. For example, it's false with the _random extension:
import sys
import _random
rng = _random.Random()
del _random
del sys.modules['_random']
import _random
print(isinstance(rng, _random.Random))
The problem for _datetime is the C API and the capsule. Currently, the C API is based on PyDateTimeAPI variable (one variable is created every time the datetime.h header is included.) I would prefer PyDateTime_Check() to return true even if there are two instances of the datetime module.
I would care less if the C API would be stateful: pass explicitly a module/state/something. But changing the C API is out of the scope of this issue.
But changing the C API is out of the scope of this issue.
The SC said that the datetime C API is protected by PEP-387, even though datetime.h is not explicitly included in Python.h. IMO, we should deprecate the current C API and introduce a new one. That is the easiest solution technically, and the easiest to relate to for users. I agree with Victor that it deserves its own issue (and perhaps also a Discourse topic).
I merged my first change "Move types to datetime state".
@neonene: Do you want to propose changes to convert static types to heap types? Maybe start with a PR to convert only 4 types to make the PR short enough so it's easier to review.
Types:
PyTypeObject *date_type;
PyTypeObject *datetime_type;
PyTypeObject *delta_type;
PyTypeObject *isocalendar_date_type;
PyTypeObject *time_type;
PyTypeObject *tzinfo_type;
PyTypeObject *timezone_type;
IMO, we should deprecate the current C API and introduce a new one.
If we manage to isolate the _datetime extension without breaking the C API, I'm not sure that it's needed.
I'll wait https://github.com/python/steering-council/issues/243, which is a necessary procedure.
A key functional test case in relation to https://github.com/python/steering-council/issues/243 is ensuring that previous references to the capsule API still work after a module reload. Specifically, this sequence:
- capsule reference is acquired
- module is reloaded
- capsule reference is used
This will have historically worked due to the implicit module caching in the single-phase init support for extension modules, but I'm concerned that a full migration to true reload support will break it (and likely crash the interpreter in the process).
I'd also be genuinely surprised if such a test case already exists, hence the green CI on https://github.com/python/cpython/pull/118337 may not be enough to show that the migration to multi-phase init hasn't broken anything.
Any reasonable producer is welcome. I think the concern about the capsule is equivalent to that about the module state, because the capsule reference is in sync with the new module. If the behavior were problematic, not only _datetime but also all modules should rely on PyInterpreterState.
FYI, I'm hoping to wrap this up (or get really close) during the PyCon US sprints. I'll be here through Thursday. @pganssle and @erlend-aasland are here too, so I'll get as much help as I can.
CC @1st1
@neonene the missing test coverage I was referring to on the SC issue is described in this comment: https://github.com/python/cpython/issues/117398#issuecomment-2106112547
the missing test coverage I was referring to on the SC issue is described in this comment: #117398 (comment)
Can you explain what needs to be tested, using _datetime API?
Concerning the new generic C-API design, I doubt at this point the need to keep pointers unchanged across module-reloads. The new API can be offered exclusively for third-party modules with new usage, and the migration will not be required if the module-authors care about backwards compatibility.
FYI, I chatted with @pganssle and @erlend-aasland (and @encukou) at the PyCon US sprints and we resolved on the following plan for beta 2:
- basic multi-phase init
- convert to heap types
- convert to module state (mostly)
- disable the C-API for subinterpreters (but preserve status quo for main interp)
Then we can separately tackle getting the C-API working for subinterpreters.
@erlend-aasland is planning on tackling the first three, since he's mostly done it all gh-102995. I'm making a plan for step 4. The timeline is short (beta 2 is in ~10 days, per Thomas), but given all the prior work I'm confident we can get it done in time.
FWIW, various previously merged PRs will help make some of these steps easier, so thanks!
Here's my tentative plan for the datetime C-API:
- all 5 types will stay static and we'll use the trick we use for builtin types to address cross-interpreter issues
- the "TimeZone_UTC" will become a static global singleton, instead of allocating it from the heap
- the zero-offset delta object (needed for UTC) will also become a static global singleton
- we'll make sure the functions in the datetime C-API operate relative to the current interpreter when called
I'll need to double-check that the difference between per-interpreter and per-module is benign.
@neonene does that plan make sense based on the work you have done?
I think the plans do not conflict with _datetimemodule.c PoC, which is not in sync with main after c2627d6eea924daf80f374c18a5fd73ef61283fa, though. Feel free to separate or take over.
- basic multi-phase init
Now that #119472 is applied, maybe the sequence could be reconsidered?
@ericsnowcurrently
- all 5 types will stay static and we'll use the trick we use for builtin types to address cross-interpreter issues
Does the tp_dict of them keep valid after the interned strings cleanup? (Issue: #118608)
tp_dict, tp_sublasses, and the weakrefs are all a problem. I'd like to see if we can use the same solution as for the builtin static types.
- disable the C-API for subinterpreters (but preserve status quo for main interp)
Then we can separately tackle getting the C-API working for subinterpreters.
At least the zoneinfo module will not work on subinterpreters (e.g. #112100) with this separation, right?
(Sorry for the close by mistake)
That step 4 is mostly obsolete. With the solution I merged for the datetime C-API, the problems of isolation for subinterpreters there are mostly resolved. We only have to make sure the exposed types are isolated (e.g. by applying the same solution as for builtin types).
We only have to make sure the exposed types are isolated (e.g. by applying the same solution as for builtin types).
I think PyDate*_Check(op) API macros are allowed to successfully compare a C-API type with a normal _datetime (heap) type.
@ncoghlan
- capsule reference is acquired
- module is reloaded
- capsule reference is used
I think I've found a regression case. A user can keep the borrowed reference to a typeobject outside the capsule, which will be a dangling pointer after unloading the module or Py_Finalize().
@ericsnowcurrently
Regarding my previous post, I agree that _datetime module types also use the same solution as for the C-API types. I still think per-module heaptypes are enough for third-party modules, though.
I think
PyDate*_Check(op)API macros are allowed to successfully compare a C-API type with a normal_datetime(heap) type.
That shouldn't be a problem. The fact that the types in the C-API are static types doesn't matter, does it?
Perhaps I'm not clear what you mean by "normal _datetime (heap) type". The types from the _datetime module should be the same as the ones provided by the datetime C-API. That's why we cannot convert them to heap types if we want to support subinterpreters.
Regarding my previous post, I agree that _datetime module types also use the same solution as for the C-API types. I still think per-module heaptypes are enough for third-party modules, though.
I agree that we should generally use heap types in isolated extension modules. datetime is an exception because of its own C-API.
Here's the updated TODO for beta 2:
- basic multi-phase init (gh-119373)
- convert to heap types (except for 5 types exposed by datetime C-API)
- convert to module state (mostly/all)
- update 5 remaining static types to be shareable (a la the static builtin types)
- set the flag on the module def that says it supports per-interpreter GIL
@erlend-aasland, the backport of that multiphase PR broke a bunch of the 3.13 buildbots, so I'm reverting it. We can regroup and sort that out.
For example: https://buildbot.python.org/all/#builders/1508/builds/69
test_utc_alias (test.datetimetester.TestModule_Pure.test_utc_alias) ... ok
test_utc_alias (test.datetimetester.TestModule_Fast.test_utc_alias) ... FAIL
======================================================================
FAIL: test_utc_alias (test.datetimetester.TestModule_Fast.test_utc_alias)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/buildbot/buildarea/3.13.cstratak-RHEL7-x86_64/build/Lib/test/datetimetester.py", line 98, in test_utc_alias
self.assertIs(UTC, timezone.utc)
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
AssertionError: datetime.timezone(datetime.timedelta(0)) is not datetime.timezone.utc
----------------------------------------------------------------------
Ran 2 tests in 0.007s
FAILED (failures=1)
test test_datetime failed
Looks like I forgot to backport gh-119472. I'm pretty sure that's it.