normality
normality copied to clipboard
Test failures in 2.5.0 with Python 3.12
test_guess_encoding, test_petro_iso_encoded, test_predict_encoding are failing in 2.5.0 with Python 3.12:
============================= test session starts ==============================
platform linux -- Python 3.12.0, pytest-7.4.2, pluggy-1.3.0
rootdir: /builddir/build/BUILD/normality-2.5.0
collected 19 items
tests/test_normality.py .....F..F.F.... [ 78%]
tests/test_paths.py .. [ 89%]
tests/test_scripts.py .. [100%]
=================================== FAILURES ===================================
______________________ NormalityTest.test_guess_encoding _______________________
self = <tests.test_normality.NormalityTest testMethod=test_guess_encoding>
def test_guess_encoding(self):
text = u"Порошенко Петро Олексійович"
encoded = text.encode("iso-8859-5")
out = guess_encoding(encoded)
> self.assertEqual("iso8859-5", out)
E AssertionError: 'iso8859-5' != 'cp1006'
E - iso8859-5
E + cp1006
tests/test_normality.py:72: AssertionError
_____________________ NormalityTest.test_petro_iso_encoded _____________________
self = <tests.test_normality.NormalityTest testMethod=test_petro_iso_encoded>
def test_petro_iso_encoded(self):
text = u"Порошенко Петро Олексійович"
encoded = text.encode("iso8859-5")
out = stringify(encoded)
> self.assertEqual(text, out)
E AssertionError: 'Порошенко Петро Олексійович' != 'ﺟﻐﻓﻐﻟﻁﻏﻌﻐ ﺟﻁﻗﻓﻐ ﺝﻍﻁﻌﻕﺉﻋﻐﺻﻊﻝ'
E - Порошенко Петро Олексійович
E + ﺟﻐﻓﻐﻟﻁﻏﻌﻐ ﺟﻁﻗﻓﻐ ﺝﻍﻁﻌﻕﺉﻋﻐﺻﻊﻝ
tests/test_normality.py:94: AssertionError
_____________________ NormalityTest.test_predict_encoding ______________________
self = <tests.test_normality.NormalityTest testMethod=test_predict_encoding>
def test_predict_encoding(self):
text = u"Порошенко Петро Олексійович"
encoded = text.encode("iso-8859-5")
out = predict_encoding(encoded)
> self.assertEqual("iso8859-5", out)
E AssertionError: 'iso8859-5' != 'cp1006'
E - iso8859-5
E + cp1006
tests/test_normality.py:78: AssertionError
=============================== warnings summary ===============================
tests/test_normality.py::NormalityTest::test_guess_encoding
/builddir/build/BUILD/normality-2.5.0/normality/encoding.py:76: DeprecationWarning: guess_encoding is now deprecated. Use predict_encoding instead
warnings.warn(
tests/test_normality.py::NormalityTest::test_guess_file_encoding
/builddir/build/BUILD/normality-2.5.0/normality/encoding.py:95: DeprecationWarning: guess_encoding is now deprecated. Use predict_encoding instead
warnings.warn(
tests/test_normality.py::NormalityTest::test_guess_file_encoding
/builddir/build/BUILD/normality-2.5.0/normality/encoding.py:41: DeprecationWarning: normalize_result is now deprecated. Use tidy_result instead
warnings.warn(
tests/test_normality.py::NormalityTest::test_guess_file_encoding
/builddir/build/BUILD/normality-2.5.0/normality/encoding.py:16: DeprecationWarning: normalize_encoding is now deprecated. Use tidy_encoding instead
warnings.warn(
tests/test_normality.py::NormalityTest::test_stringify_datetime
/builddir/build/BUILD/normality-2.5.0/tests/test_normality.py:64: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
dt = datetime.utcnow()
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/test_normality.py::NormalityTest::test_guess_encoding - Assertio...
FAILED tests/test_normality.py::NormalityTest::test_petro_iso_encoded - Asser...
FAILED tests/test_normality.py::NormalityTest::test_predict_encoding - Assert...
=================== 3 failed, 16 passed, 5 warnings in 0.21s ===================
Thanks for reporting this - looks super funky (cp1006 is Urdu, as far as I can tell). Can you tell me what version of charset-normalizer
you have installed?
Were on 3.3.1 since 4 days.
https://src.fedoraproject.org/rpms/python-charset-normalizer/c/4439d3b82085a2175c6ab7e622f6243dcb809d74?branch=rawhide
In 3.3.0, they added support for cp1006: https://github.com/Ousret/charset_normalizer/pull/328
They have a bug with incorrect detection in 3.3.1: https://github.com/Ousret/charset_normalizer/issues/371 though not cp1006
Is this fixed? I issued a release to address this.