parser
parser copied to clipboard
📜 Extract meaningful content from the chaos of a web page
- Platform: Linux server 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019 x86_64 GNU/Linux - Node Version v10.9.0 - [email protected] `mercury-parser https://www.greaterwrong.com/posts/SqF8cHjJv43mvJJzx/feeling-rational` outputs: ``` { "title": "Feeling Rational -...
Hello guys! If u can please advice: when I parse urls I'm getting a lot of the cases like Error: ETIMEDOUT - http://www.slobodna-bosna.ba/vijest/130830/zovu_ga_kralj_sevdaha_bozo_vreco_odusevio_zagrepchane_svi_prichaju_o_haljini_s_dubokim_dekolteom_foto.html Error: ETIMEDOUT - http://www.azdarar.am/announcments/org/450/00574132 Error: ETIMEDOUT -...
This is a custom extractor for www.gruene.de
- **Platform**: MacOS - **Mercury Parser Version**: `master` - **Node Version (if a Node bug)**: v11.1.0 - **Browser Version (if a browser bug)**: n/a ## Description I have created a...
- **Platform**: Visual Studio Code - **Mercury Parser Version**: `master` - **Node Version (if a Node bug)**: v11.1.0 - **Browser Version (if a browser bug)**: n/a ## Expected Behavior Wenn...
# Improvement Rationale I am using `mercury-parser` to extract content of RSS feed articles, but my scraping script failed to extract some articles because SSL certificate of article's website was...
This change will make mercury ignore html entities and special characters. Here's an [example of the output change](https://gist.github.com/benubois/09d4b4387b90627b5fdd1f832f89d790/revisions#diff-70e920c759c725db24fc1bbd255fd573). This is a big change in behavior and breaks about 80 tests....
If the array includes a callback function as 3rd element, all the results pass through that transformer. Returned values are the result of selector. For example: ```javascript date_published: { selectors:...
Hi there! Is there any possibility to use a proxy while downloading the data from websites and parse them? I can't find a related issue or feature. By the way,...
And perhaps other common document types. I believe that Readability supported these with a hard-coded exception based on file extension or mime type. ``` curl -H "x-api-key: API_KEY_HERE" "https://mercury.postlight.com/parser?url=https://wordpress.org/plugins/about/readme.txt" ```...