feedparser
feedparser copied to clipboard
Order of HTML attributes in a content of feed entry is unstable
Problem description
feedparser does not preserve and does not determine an order of HTML attributes in a content of feed entry. So the same content value of the same feed entry when returned by feedparser can differ.
Impact
It is not possible to use hash of the content to check if the content is changed over time.
STR
Run the test:
import unittest
import textwrap
import subprocess
code = textwrap.dedent('''\
import feedparser
parsed = feedparser.parse('https://savannah.gnu.org/news/atom.php?group=tar')
entry = next(filter((lambda entry: entry.id == 'http://savannah.gnu.org/forum/forum.php?forum_id=8545'), parsed.entries))
content_value = entry.content[0].value
print(content_value)
''')
class MyTestCase(unittest.TestCase):
def test(self):
res1 = subprocess.check_output(['python', '-c', code])
res2 = subprocess.check_output(['python', '-c', code])
self.assertEqual(res1, res2)
Actual result
AssertionError: b'<p>[2158 chars]nput readonly="readonly" class="verbatim" size[2149 chars]\r\n' != b'<p>[2158 chars]nput value=" OLDNAME NEWNAME[:NEWID] " cla[2149 chars]\r\n'
Expected result
The test is passed.
Possible solution
Preserve or determine an order of HTML attributes in a content of feed entry.