python-Levenshtein icon indicating copy to clipboard operation
python-Levenshtein copied to clipboard

Py_UNICODE is deprecated

Open methane opened this issue 5 years ago • 5 comments

Py_UNICODE is deprecated since Python 3.3, and we are planning to remove them in Python 3.11. Py_UNICODE is deprecated since Python 3.3 and will be removed in Python 3.11. Would you replace Py_UNICODE with wchar_t, and PyUnicode_FromUnicode with PyUnicode_FromWideChar?

./python-Levenshtein-0.12.0/Levenshtein/_levenshtein.c:1001:      result = PyUnicode_FromUnicode(medstr, len);
./python-Levenshtein-0.12.0/Levenshtein/_levenshtein.c:1088:      result = PyUnicode_FromUnicode(medstr, len);
./python-Levenshtein-0.12.0/Levenshtein/_levenshtein.c:1930:      result = PyUnicode_FromUnicode(s, len);
./python-Levenshtein-0.12.0/Levenshtein/_levenshtein.c:1946:      result = PyUnicode_FromUnicode(s, len);

methane avatar Jun 15 '20 00:06 methane

Is PyUnicode_FromUnicode actually removed in Python 3.11? The Python docs lists the removal for Python 3.12.

maxbachmann avatar Jan 17 '22 11:01 maxbachmann

The removal is postponed to Python 3.12. But PyUnicode_FromUnicode() emits runtime warning. So it is very inefficient already.

methane avatar Jan 17 '22 11:01 methane

Good to know. I will replace it in my fork in the next release.

maxbachmann avatar Jan 17 '22 11:01 maxbachmann

@methane does PyUnicode_AS_UNICODE emit a warning as well? This API is significantly harder to replace, since there is no 1:1 replacement (either needs to handle 1/2/4 Byte sizes or allocate + deallocate).

maxbachmann avatar Jan 17 '22 12:01 maxbachmann

Use PyUnicode_AsUCS4Copy() and PyMem_Free(). PyUnicode_AS_UNICODE() uses UTF-16 on Windows. I think it is bad for levenshtein library.

methane avatar Jan 17 '22 14:01 methane