php-goose
php-goose copied to clipboard
Readability / Html Content / Article Extractor & Web Scrapping library written in PHP
I was testing this library and alot of website return NULL. Here is an example: https://www.aljazeera.com/news/2019/09/military-base-italian-military-convoy-attacked-somalia-190930102422698.html
Amazing Project!! Wondering though if it would be possible to keep images and embedded social media / youtube in the final output. I often find articles with a structure like...
My disk is full due to the creation of this temporary file while using the goose client continuously. https://prnt.sc/1r6etxe
Hi Can i pass HTML data directly instead of providing a url ? Thank you
Uncaught exception: cURL error 28: Operation timed out after 60001 milliseconds with 0 bytes received (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for https://www.thenewstribune.com/news/business/article253829213.html
How to deal with stuff like this Fatal error: Uncaught GuzzleHttp\Exception\ServerException: Server error: `GET https://genius.com/a/jamaican-music-legend-lee-scratch-perry-dead-at-85` resulted in a `503 Service Temporarily Unavailable` response:
Hi! Lovely package - but i don't understand the reasoning behind stripping the page of it's headings (and thus the outline structure): https://github.com/scotteh/php-goose/blob/78599a1bac6af271ce8bc4fab6cdce76c1d3feca/src/Modules/Formatters/OutputFormatter.php#L174 Would you consider a PR that makes...
_Dependabot Preview will be shut down on August 3rd, 2021. In order to keep getting Dependabot updates, please merge this PR and migrate to GitHub-native Dependabot before then._ Dependabot has...
Hi, there is some encoding problem with the pretty quotes. Is it possible to fix this somehow with config ? Here is an example page: https://www.gq.com/story/nike-tanjun-most-popular-shoe Result: 1. “The Ten”...
Can you add more languages? Like "sk" and "bg".