ReadabiliPy icon indicating copy to clipboard operation
ReadabiliPy copied to clipboard

A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.

Results 19 ReadabiliPy issues
Sort by recently updated
recently updated
newest added

Working directory for node process passed as parameter for `subprocess.check_call()`. Solves bug where if an exception (`JSONDecodeError`) occured while loading the json (line 46 of `simple_json.py`), the working directory is...

I read you can use this package without Node.js. What will the difference in execution be with vs. without Node? I see downloads for the Node binary on their website....

Hello! I was have this error when run default code example: ``` #!/usr/bin/env python # -*- coding: utf-8 -*- import requests from readabilipy import simple_json_from_html_string req = requests.get('https://en.wikipedia.org/wiki/Readability') article =...

I want to update newest version of Readability.js. In this URL: https://github.com/mozilla/readability So where in code I will replace for upgrade newest version readability?

Calling ExtractArticle.js on a "broken" site can cause node to write a lot of error and warning messages to stdout. Because the output gets ignored anyway it might make more...

inspiration from this #97 . Unit tested using available tests on Linux Ubuntu 18.04

This PR removes the need to hard-code Readability.js in the package and instead simply imports it from NPM. This will ensure that the user gets the most recent improvements in...

Running ReadabiliPy from multiple Python processes causes the /tmp/full.html files to be overwritten when using the Node readability code. Relevant code here: https://github.com/alan-turing-institute/ReadabiliPy/blob/554327240dd8d8178fb460cc24cb762e0a45e366/readabilipy/simple_json.py#L33-L44 Could we add just a random UUID...

Some HTML files produce extra entries in the plain_text key of the JSON with the full key, in addition to the entries with the text of each paragraph, i.e., the...

## Solution Added by @martintoreilly 1. (minimal fix): update our local copy of Readability.js to the latest file from the Mozilla package 2. (better): stop duplicating Readability.js as a local...

bug