boilerpipe
boilerpipe copied to clipboard
Difference WebApi - Api
I am using Boilerpipe for both web-api and api . For example on the site
http://www.davidicke.com/forum/showthread.php?page=2&t=72909 , Boilerpipe
WebAPI working properly while the boilerpipe api return the error
"java.io.IOException: Server returned HTTP response code: 403 for URL:
http://boilerpipe-web.appspot.com/extract?url=http://www.davidicke.com/forum/sho
wthread.php?page%3D2%26t%3D72909&extractor=KeepEverythingExtractor&output=htmlFr
agment"
Help me! I do not use any proxy
Original issue reported on code.google.com by [email protected]
on 28 Mar 2013 at 4:37
i think the problem is because they do not use an user agent when asking for
the html, and thus creates an error 403 in some websites, but you can try to
download the html manually and then send that to the
ArticleExtractor.INSTANCE.getText(String text) but i am not sure.
Original comment by [email protected]
on 17 Aug 2013 at 12:35