normality icon indicating copy to clipboard operation
normality copied to clipboard

Test failures in 2.5.0 with Python 3.12

Open eclipseo opened this issue 1 year ago • 4 comments

test_guess_encoding, test_petro_iso_encoded, test_predict_encoding are failing in 2.5.0 with Python 3.12:

============================= test session starts ==============================
platform linux -- Python 3.12.0, pytest-7.4.2, pluggy-1.3.0
rootdir: /builddir/build/BUILD/normality-2.5.0
collected 19 items
tests/test_normality.py .....F..F.F....                                  [ 78%]
tests/test_paths.py ..                                                   [ 89%]
tests/test_scripts.py ..                                                 [100%]
=================================== FAILURES ===================================
______________________ NormalityTest.test_guess_encoding _______________________
self = <tests.test_normality.NormalityTest testMethod=test_guess_encoding>
    def test_guess_encoding(self):
        text = u"Порошенко Петро Олексійович"
        encoded = text.encode("iso-8859-5")
        out = guess_encoding(encoded)
>       self.assertEqual("iso8859-5", out)
E       AssertionError: 'iso8859-5' != 'cp1006'
E       - iso8859-5
E       + cp1006
tests/test_normality.py:72: AssertionError
_____________________ NormalityTest.test_petro_iso_encoded _____________________
self = <tests.test_normality.NormalityTest testMethod=test_petro_iso_encoded>
    def test_petro_iso_encoded(self):
        text = u"Порошенко Петро Олексійович"
        encoded = text.encode("iso8859-5")
        out = stringify(encoded)
>       self.assertEqual(text, out)
E       AssertionError: 'Порошенко Петро Олексійович' != 'ﺟﻐﻓﻐﻟﻁﻏﻌﻐ ﺟﻁﻗﻓﻐ ﺝﻍﻁﻌﻕﺉﻋﻐﺻﻊﻝ'
E       - Порошенко Петро Олексійович
E       + ﺟﻐﻓﻐﻟﻁﻏﻌﻐ ﺟﻁﻗﻓﻐ ﺝﻍﻁﻌﻕﺉﻋﻐﺻﻊﻝ
tests/test_normality.py:94: AssertionError
_____________________ NormalityTest.test_predict_encoding ______________________
self = <tests.test_normality.NormalityTest testMethod=test_predict_encoding>
    def test_predict_encoding(self):
        text = u"Порошенко Петро Олексійович"
        encoded = text.encode("iso-8859-5")
        out = predict_encoding(encoded)
>       self.assertEqual("iso8859-5", out)
E       AssertionError: 'iso8859-5' != 'cp1006'
E       - iso8859-5
E       + cp1006
tests/test_normality.py:78: AssertionError
=============================== warnings summary ===============================
tests/test_normality.py::NormalityTest::test_guess_encoding
  /builddir/build/BUILD/normality-2.5.0/normality/encoding.py:76: DeprecationWarning: guess_encoding is now deprecated. Use predict_encoding instead
    warnings.warn(
tests/test_normality.py::NormalityTest::test_guess_file_encoding
  /builddir/build/BUILD/normality-2.5.0/normality/encoding.py:95: DeprecationWarning: guess_encoding is now deprecated. Use predict_encoding instead
    warnings.warn(
tests/test_normality.py::NormalityTest::test_guess_file_encoding
  /builddir/build/BUILD/normality-2.5.0/normality/encoding.py:41: DeprecationWarning: normalize_result is now deprecated. Use tidy_result instead
    warnings.warn(
tests/test_normality.py::NormalityTest::test_guess_file_encoding
  /builddir/build/BUILD/normality-2.5.0/normality/encoding.py:16: DeprecationWarning: normalize_encoding is now deprecated. Use tidy_encoding instead
    warnings.warn(
tests/test_normality.py::NormalityTest::test_stringify_datetime
  /builddir/build/BUILD/normality-2.5.0/tests/test_normality.py:64: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
    dt = datetime.utcnow()
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/test_normality.py::NormalityTest::test_guess_encoding - Assertio...
FAILED tests/test_normality.py::NormalityTest::test_petro_iso_encoded - Asser...
FAILED tests/test_normality.py::NormalityTest::test_predict_encoding - Assert...
=================== 3 failed, 16 passed, 5 warnings in 0.21s ===================

eclipseo avatar Oct 26 '23 19:10 eclipseo

Thanks for reporting this - looks super funky (cp1006 is Urdu, as far as I can tell). Can you tell me what version of charset-normalizer you have installed?

pudo avatar Oct 27 '23 07:10 pudo

Were on 3.3.1 since 4 days.

https://src.fedoraproject.org/rpms/python-charset-normalizer/c/4439d3b82085a2175c6ab7e622f6243dcb809d74?branch=rawhide

In 3.3.0, they added support for cp1006: https://github.com/Ousret/charset_normalizer/pull/328

eclipseo avatar Oct 27 '23 16:10 eclipseo

They have a bug with incorrect detection in 3.3.1: https://github.com/Ousret/charset_normalizer/issues/371 though not cp1006

eclipseo avatar Oct 27 '23 16:10 eclipseo

Is this fixed? I issued a release to address this.

Ousret avatar Nov 01 '23 21:11 Ousret