pyquery icon indicating copy to clipboard operation
pyquery copied to clipboard

Test failures with libxml2 2.14

Open mweinelt opened this issue 6 months ago • 5 comments

After upgrading libxml2 from 2.13.8 to 2.14.3 we are seeing the following test failures with pyquery 2.0.1.

____________________ TestManipulating.test_val_for_textarea ____________________

self = <tests.test_pyquery.TestManipulating testMethod=test_val_for_textarea>

    def test_val_for_textarea(self):
        d = pq(self.html3)
        self.assertEqual(d('#textarea-single').val(), 'Spam')
        self.assertEqual(d('#textarea-single').text(), 'Spam')
        d('#textarea-single').val('42')
        self.assertEqual(d('#textarea-single').val(), '42')
        # Note: jQuery still returns 'Spam' here.
        self.assertEqual(d('#textarea-single').text(), '42')
    
        multi_expected = '''Spam\n<b>Eggs</b>\nBacon'''
>       self.assertEqual(d('#textarea-multi').val(), multi_expected)
E       AssertionError: 'Spam\n&lt;b&gt;Eggs&lt;/b&gt;\nBacon' != 'Spam\n<b>Eggs</b>\nBacon'
E         Spam
E       - &lt;b&gt;Eggs&lt;/b&gt;
E       + <b>Eggs</b>
E         Bacon

tests/test_pyquery.py:534: AssertionError
_______________________ TestHTMLParser.test_replaceWith ________________________

self = <tests.test_pyquery.TestHTMLParser testMethod=test_replaceWith>

    def test_replaceWith(self):
        expected = '''<div class="portlet">
      <a href="/toto">TestimageMy link text</a>
      <a href="/toto2">imageMy link text 2</a>
      Behind you, a three-headed HTML&amp;dash;Entity!
    </div>'''
        d = pq(self.html)
        d('img').replace_with('image')
        val = d.__html__()
>       assert val == expected, (repr(val), repr(expected))
E       AssertionError: ('\'<div class="portlet">\
E               <a href="/toto">TestimageMy link text</a>\
E               <a href="/toto2">imageMy link text...     <a href="/toto2">imageMy link text 2</a>\
E               Behind you, a three-headed HTML&amp;dash;Entity!\
E             </div>\'')
E       assert '<div class="...!\n    </div>' == '<div class="...!\n    </div>'
E         
E         Skipping 144 identical leading characters in diff, use -v to show
E         - eaded HTML&amp;dash;Entity!
E         ?           ^^^^^^^^^^
E         + eaded HTML‐Entity!
E         ?           ^
E               </div>

tests/test_pyquery.py:810: AssertionError
________________ TestHTMLParser.test_replaceWith_with_function _________________

self = <tests.test_pyquery.TestHTMLParser testMethod=test_replaceWith_with_function>

    def test_replaceWith_with_function(self):
        expected = '''<div class="portlet">
      TestimageMy link text
      imageMy link text 2
      Behind you, a three-headed HTML&amp;dash;Entity!
    </div>'''
        d = pq(self.html)
        d('a').replace_with(lambda i, e: pq(e).html())
        val = d.__html__()
>       assert val == expected, (repr(val), repr(expected))
E       AssertionError: ('\'<div class="portlet">\
E               TestimageMy link text\
E               imageMy link text 2\
E               Behind you, a three-headed...imageMy link text\
E               imageMy link text 2\
E               Behind you, a three-headed HTML&amp;dash;Entity!\
E             </div>\'')
E       assert '<div class="...!\n    </div>' == '<div class="...!\n    </div>'
E         
E         Skipping 103 identical leading characters in diff, use -v to show
E         - eaded HTML&amp;dash;Entity!
E         ?           ^^^^^^^^^^
E         + eaded HTML‐Entity!
E         ?           ^
E               </div>

tests/test_pyquery.py:821: AssertionError
__________________________ TestWebScrapping.test_get ___________________________

self = <tests.test_pyquery.TestWebScrapping testMethod=test_get>

    def test_get(self):
        d = pq(url=self.application_url, data={'q': 'foo'},
               method='get')
        print(d)
>       self.assertIn('REQUEST_METHOD: GET', d('p').text())
E       AssertionError: 'REQUEST_METHOD: GET' not found in ''

tests/test_pyquery.py:902: AssertionError
----------------------------- Captured stdout call -----------------------------
<span>HTTP_ACCEPT: */*
HTTP_ACCEPT_ENCODING: gzip, deflate
HTTP_CONNECTION: keep-alive
HTTP_HOST: 127.0.0.1:52473
HTTP_USER_AGENT: python-requests/2.32.3
PATH_INFO: /
QUERY_STRING: q=foo
REMOTE_ADDR: 127.0.0.1
REMOTE_HOST: 127.0.0.1
REMOTE_PORT: 37262
REQUEST_METHOD: GET
REQUEST_URI: /?q=foo
SCRIPT_NAME: 
SERVER_NAME: waitress.invalid
SERVER_PORT: 52473
SERVER_PROTOCOL: HTTP/1.1
SERVER_SOFTWARE: waitress
waitress.client_disconnected: <bound method="" httpchannel.check_client_disconnected="" of="" <waitress.channel.httpchannel="" connected="" 127.0.0.1:37262="" at="" 0x7ffff52f3f00=

webob._parsed_query_vars: (GET([('q', 'foo')]), 'q=foo')
wsgi.errors: <encodedfile name="&lt;_io.FileIO name=8 mode='rb+' closefd=True&gt;" mode="r+" encoding="utf-8">
wsgi.file_wrapper: <class 'waitress.buffers.readonlyfilebasedbuffer'="">
wsgi.input: &lt;_io.BytesIO object at 0x7ffff51e04f0&gt;
wsgi.input_terminated: True
wsgi.multiprocess: False
wsgi.multithread: True
wsgi.run_once: False
wsgi.url_scheme: 'http'
wsgi.version: (1, 0)
</class></encodedfile></bound></span>
__________________________ TestWebScrapping.test_post __________________________

self = <tests.test_pyquery.TestWebScrapping testMethod=test_post>

    def test_post(self):
        d = pq(url=self.application_url, data={'q': 'foo'},
               method='post')
>       self.assertIn('REQUEST_METHOD: POST', d('p').text())
E       AssertionError: 'REQUEST_METHOD: POST' not found in ''

tests/test_pyquery.py:908: AssertionError
________________________ TestWebScrapping.test_session _________________________

self = <tests.test_pyquery.TestWebScrapping testMethod=test_session>

    def test_session(self):
        if HAS_REQUEST:
            import requests
            session = requests.Session()
            session.headers.update({'X-FOO': 'bar'})
            d = pq(url=self.application_url, data={'q': 'foo'},
                   method='get', session=session)
>           self.assertIn('HTTP_X_FOO: bar', d('p').text())
E           AssertionError: 'HTTP_X_FOO: bar' not found in ''

tests/test_pyquery.py:918: AssertionError

mweinelt avatar May 31 '25 14:05 mweinelt

The first link I found is https://dart.reviewpoint.org/blog/py3-lxml-fails-to-build

I guess the real problem is with lxml

gawel avatar Jun 04 '25 15:06 gawel

Right, sorry for not mentioning that earlier. We also upgraded lxml from 5.3.1 to 5.4.0.

https://github.com/lxml/lxml/blob/lxml-5.4.0/CHANGES.txt

mweinelt avatar Jun 04 '25 15:06 mweinelt

And the changelog says "Binary wheels use libxml2 2.13.8". So I'm not sure it's compatible with 2.14. That's the point.

gawel avatar Jun 04 '25 16:06 gawel

I experience the same test failure, with libxml2 2.14.5, python-lxml 6.0.2, python-pyquery 2.0.1. Here are build logs: python-lxml https://build.opensuse.org/package/show/home:pgajdos:libxml2/python-lxml https://build.opensuse.org/package/live_build_log/home:pgajdos:libxml2/python-lxml/openSUSE_Tumbleweed/x86_64 python-pyquery https://build.opensuse.org/package/show/home:pgajdos:libxml2/python-pyquery https://build.opensuse.org/package/live_build_log/home:pgajdos:libxml2/python-pyquery:test/openSUSE_Tumbleweed/x86_64

python-lxml 6.0.2 changelog says:

  • LP#2125278: Compilation with libxml2 2.15.0 failed. Original patch by Xi Ruoyao.
  • Setting decompress=True in the parser had no effect in libxml2 2.15.
  • Binary wheels on Linux and macOS use the library version libxml2 2.14.6. See https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.14.6
  • Test failures in libxml2 2.15.0 were fixed.

So I guess it is supposed to work with libxml2 2.14. If you find out I could provide more info, let me know. I have not much background in this area, though.

pgajdos avatar Sep 24 '25 10:09 pgajdos

(I am skipping these tests now, otherwise it fails as the reporter says. I can remove them from skip list whenever you want.)

pgajdos avatar Sep 24 '25 10:09 pgajdos