php-boiler-pipe
php-boiler-pipe copied to clipboard
Docs and boilerpipe modes
Thanks for porting boilerpipe to PHP. Can you provide some documentation on how to use the different boilerpipe modes? I'm looking specifically for the HTML fragment output option. Some additional info on the readme would be helpful. Thanks!
I'm also interested in how to keep html (i.e. to extract the main content, but keep html tags, rather than stripping everything down to plain text).
@melat0nin @JordanMagnuson have you found a way to do this?
After further investigation I ended up going with the python port. It seems this php port is incomplete, check here: https://github.com/dotpack/php-boiler-pipe/blob/0acfea6f643bc970731e4327f037b197caa0e71f/src/HtmlContent.php#L27
It does work with extracting text but I don't see a way to include HTML tags.
LE: For anyone landing on this repo also do check out https://github.com/codelucas/newspaper It does the same thing and it is actively maintained.