php-boiler-pipe icon indicating copy to clipboard operation
php-boiler-pipe copied to clipboard

Docs and boilerpipe modes

Open melat0nin opened this issue 8 years ago • 3 comments

Thanks for porting boilerpipe to PHP. Can you provide some documentation on how to use the different boilerpipe modes? I'm looking specifically for the HTML fragment output option. Some additional info on the readme would be helpful. Thanks!

melat0nin avatar Mar 15 '16 12:03 melat0nin

I'm also interested in how to keep html (i.e. to extract the main content, but keep html tags, rather than stripping everything down to plain text).

JordanMagnuson avatar May 04 '17 03:05 JordanMagnuson

@melat0nin @JordanMagnuson have you found a way to do this?

phpfile avatar Apr 19 '20 10:04 phpfile

After further investigation I ended up going with the python port. It seems this php port is incomplete, check here: https://github.com/dotpack/php-boiler-pipe/blob/0acfea6f643bc970731e4327f037b197caa0e71f/src/HtmlContent.php#L27

It does work with extracting text but I don't see a way to include HTML tags.

LE: For anyone landing on this repo also do check out https://github.com/codelucas/newspaper It does the same thing and it is actively maintained.

phpfile avatar Apr 19 '20 11:04 phpfile