parser icon indicating copy to clipboard operation
parser copied to clipboard

📜 Extract meaningful content from the chaos of a web page

Results 148 parser issues
Sort by recently updated
recently updated
newest added

- Platform: Linux server 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019 x86_64 GNU/Linux - Node Version v10.9.0 - [email protected] `mercury-parser https://www.greaterwrong.com/posts/SqF8cHjJv43mvJJzx/feeling-rational` outputs: ``` { "title": "Feeling Rational -...

Hello guys! If u can please advice: when I parse urls I'm getting a lot of the cases like Error: ETIMEDOUT - http://www.slobodna-bosna.ba/vijest/130830/zovu_ga_kralj_sevdaha_bozo_vreco_odusevio_zagrepchane_svi_prichaju_o_haljini_s_dubokim_dekolteom_foto.html Error: ETIMEDOUT - http://www.azdarar.am/announcments/org/450/00574132 Error: ETIMEDOUT -...

This is a custom extractor for www.gruene.de

- **Platform**: MacOS - **Mercury Parser Version**: `master` - **Node Version (if a Node bug)**: v11.1.0 - **Browser Version (if a browser bug)**: n/a ## Description I have created a...

custom parser

- **Platform**: Visual Studio Code - **Mercury Parser Version**: `master` - **Node Version (if a Node bug)**: v11.1.0 - **Browser Version (if a browser bug)**: n/a ## Expected Behavior Wenn...

# Improvement Rationale I am using `mercury-parser` to extract content of RSS feed articles, but my scraping script failed to extract some articles because SSL certificate of article's website was...

feature

This change will make mercury ignore html entities and special characters. Here's an [example of the output change](https://gist.github.com/benubois/09d4b4387b90627b5fdd1f832f89d790/revisions#diff-70e920c759c725db24fc1bbd255fd573). This is a big change in behavior and breaks about 80 tests....

If the array includes a callback function as 3rd element, all the results pass through that transformer. Returned values are the result of selector. For example: ```javascript date_published: { selectors:...

feature

Hi there! Is there any possibility to use a proxy while downloading the data from websites and parse them? I can't find a related issue or feature. By the way,...

feature

And perhaps other common document types. I believe that Readability supported these with a hard-coded exception based on file extension or mime type. ``` curl -H "x-api-key: API_KEY_HERE" "https://mercury.postlight.com/parser?url=https://wordpress.org/plugins/about/readme.txt" ```...

feature