ReadabiliPy icon indicating copy to clipboard operation
ReadabiliPy copied to clipboard

fix issue-109 (use unique temp files for input/output of ExtractArticle.js)

Open erpic opened this issue 5 months ago • 1 comments

It appears that something had previously been started to address this issue but was left as is with some 'json_path' undefined.

previously:

make test

FAILED test_javascript.py::test_extract_simple_article_with_readability_js - NameError: name 'json_path' is not defined
FAILED test_javascript.py::test_extract_article_from_page_with_readability_js - NameError: name 'json_path' is not defined
FAILED test_simple_json.py::test_plain_element_with_comments - AssertionError: assert ['<div></div>'] == ['<div><p>Tex...!----></div>']
FAILED test_simple_json.py::test_content_digest_on_filled_and_empty_elements - assert ['<div></div>'] == ['<div><p dat..."></p></div>']

unfortunately tests still don't pass with proposed changes:

FAILED test_javascript.py::test_extract_simple_article_with_readability_js - AssertionError: article_json={'title': None, 'byline': None, 'date': None, 'content': '<div id="readability-page-1" class="page"><div class="page">\n                <article>\n                    <h2> Article title </h2>\n                    <p>\n                        Proin vulputate viverra dapibus...
FAILED test_javascript.py::test_extract_article_from_page_with_readability_js - AssertionError: article_json={'title': 'Trump Denies Charitable Donation He Promised If Elizabeth Warren Releases DNA Results And It’s On Video', 'byline': 'Conover Kennard', 'date': None, 'content': '<div id="readability-page-1" class="page"><div id="post-342398">\n\n\n\n<div>\n<p>Donald Trump has re...
FAILED test_simple_json.py::test_plain_element_with_comments - AssertionError: assert ['<div></div>'] == ['<div><p>Tex...!----></div>']
FAILED test_simple_json.py::test_content_digest_on_filled_and_empty_elements - assert ['<div></div>'] == ['<div><p dat..."></p></div>']
=

but this appears to be due only to extra \n in the 'content' field and the other fields are correct.

diff_meta_fields_correct diff_extra_linebreaks_present

erpic avatar Sep 05 '24 07:09 erpic