ReadabiliPy
ReadabiliPy copied to clipboard
fix issue-109 (use unique temp files for input/output of ExtractArticle.js)
It appears that something had previously been started to address this issue but was left as is with some 'json_path' undefined.
previously:
make test
FAILED test_javascript.py::test_extract_simple_article_with_readability_js - NameError: name 'json_path' is not defined
FAILED test_javascript.py::test_extract_article_from_page_with_readability_js - NameError: name 'json_path' is not defined
FAILED test_simple_json.py::test_plain_element_with_comments - AssertionError: assert ['<div></div>'] == ['<div><p>Tex...!----></div>']
FAILED test_simple_json.py::test_content_digest_on_filled_and_empty_elements - assert ['<div></div>'] == ['<div><p dat..."></p></div>']
unfortunately tests still don't pass with proposed changes:
FAILED test_javascript.py::test_extract_simple_article_with_readability_js - AssertionError: article_json={'title': None, 'byline': None, 'date': None, 'content': '<div id="readability-page-1" class="page"><div class="page">\n <article>\n <h2> Article title </h2>\n <p>\n Proin vulputate viverra dapibus...
FAILED test_javascript.py::test_extract_article_from_page_with_readability_js - AssertionError: article_json={'title': 'Trump Denies Charitable Donation He Promised If Elizabeth Warren Releases DNA Results And It’s On Video', 'byline': 'Conover Kennard', 'date': None, 'content': '<div id="readability-page-1" class="page"><div id="post-342398">\n\n\n\n<div>\n<p>Donald Trump has re...
FAILED test_simple_json.py::test_plain_element_with_comments - AssertionError: assert ['<div></div>'] == ['<div><p>Tex...!----></div>']
FAILED test_simple_json.py::test_content_digest_on_filled_and_empty_elements - assert ['<div></div>'] == ['<div><p dat..."></p></div>']
=
but this appears to be due only to extra \n in the 'content' field and the other fields are correct.