Order of HTML attributes in a content of feed entry is unstable

Open AndreyMZ opened this issue 8 years ago • 0 comments

Problem description

feedparser does not preserve and does not determine an order of HTML attributes in a content of feed entry. So the same content value of the same feed entry when returned by feedparser can differ.

Impact

It is not possible to use hash of the content to check if the content is changed over time.

STR

Run the test:

import unittest
import textwrap
import subprocess

code = textwrap.dedent('''\
    import feedparser
    parsed = feedparser.parse('https://savannah.gnu.org/news/atom.php?group=tar')
    entry = next(filter((lambda entry: entry.id == 'http://savannah.gnu.org/forum/forum.php?forum_id=8545'), parsed.entries))
    content_value = entry.content[0].value
    print(content_value)
''')

class MyTestCase(unittest.TestCase):
    def test(self):
        res1 = subprocess.check_output(['python', '-c', code])
        res2 = subprocess.check_output(['python', '-c', code])
        self.assertEqual(res1, res2)

Actual result

AssertionError: b'<p>[2158 chars]nput readonly="readonly" class="verbatim" size[2149 chars]\r\n' != b'<p>[2158 chars]nput value="     OLDNAME NEWNAME[:NEWID] " cla[2149 chars]\r\n'

Expected result

The test is passed.

Possible solution

Preserve or determine an order of HTML attributes in a content of feed entry.

May 03 '17 07:05 AndreyMZ