Replace usage of old unicode API removed in Py3.12/PEP 623
The "legacy" Unicode object will be removed in Python 3.12 with deprecated APIs. All Unicode objects will be "canonical" since then. See PEP 623 for more information.
Those old APIs were still used in pywin32:
- PyUnicode_AsUnicode
- PyUnicode_GetSize
- PyUnicode_AS_UNICODE
- PyUnicode_GET_SIZE, PyUnicode_GET_DATA_SIZE
- PyUnicode_FromUnicode
- PyUnicode_EncodeMBCS
- u u# Z Z# in PyArg_Parse... format strings
The first Py3.12 alpha should appear in October on github for testing the deprecation replacements fully: https://peps.python.org/pep-0693/
I still need to get my head around the new macros - it adds alot of complexity I'm still untangling. I do like the ability it has to provide the
PyObject**forPyArg_ParseTuple. ... hope there's something that can be done to make that part of this easier to understand.
This strange macro mechanism (U2WREC, U2WCONV, u2w ..) and auxiliary union attribute stuff (handling source/target addresses for conversion) probably just should be dropped:
It came only with the last 2 commits (replace "u" and "Z" in PyArg_Parse... ). The only purpose was to not repetitively make extra PyObject* intermediate variables and extra statements doing PyWinObject_AsWCHAR in the PyArg_ParseTuple... use cases (like the cases where TmpWCHAR was used previously ~100x) - but doing it within TmpWCHAR itself and the PyArg_ParseTuple statement and to automatically provide specific help text upon error.
But there are only few new use cases for that. So that strange mechanism does not pay off and its not worth to compress those existing 100 cases. So its probably best to just remove those 2 commits, keep it easy to read, and write out those few new PyArg_ParseTuple use cases using the existing style. Regarding the single big bulk use case in the last commit (bulk "Z") a specific local macro and aux. array(s) can be used for conversions in a loop.
So I'm simply going to drop those 2 commits so far - handling the u and Z arg parse cases in an extra PR later ...
almost every existing use of
TmpWCHARshould eventually move to this new mechanism
(well, it doesn't save typing and testing anymore, may not be worth touching and organizing all this in a well readable way... ?)
A new object, say,
PyWin_WCHARor similar, which looks like you changes here, but unlikeTmpWCHAR, never takes ownership of memory allocated elsewhere. It only supports construction/initialization via a PyObject *.
Besides the above dropped mechanism TmpWCHAR so far would only gain the function to do auto PyUnicode_AsWideCharString at assignment / construction time (2nd commit). That could become an extra class / name / sub class as well. But so far there is not really a separate purpose (freeing the held temp string).
if we can determine the object's
PyUnicode_KINDisPyUnicode_4BYTE_KINDwe could still borrow the buffer?
For potentially saving a PyUnicode_AsWideCharString in the (rare?) case of PyUnicode_2BYTE_KIND, it seems the canonical state must be guaranteed first (PyUnicode_READY(), extra cost?, otherwise the string representation could change suddenly), then checked (again). There is also a (non-canonical?) PyUnicode_WCHAR_KIND. Is Py_UCS2* / PyUnicode_2BYTE_KIND always a valid NULL terminated Windows WCHAR string? If this works, is fast and is worth it, there would be an extra PyObject* (inc/decref) to the python string in the holder - being a flag at the same time. Could still happen in the same class.
Sorry for the delay and thanks for persevering!