boilerpipe icon indicating copy to clipboard operation
boilerpipe copied to clipboard

Difference WebApi - Api

Open GoogleCodeExporter opened this issue 9 years ago • 1 comments

I am using Boilerpipe for both web-api and api . For example on the site 
http://www.davidicke.com/forum/showthread.php?page=2&t=72909 , Boilerpipe 
WebAPI working properly while the boilerpipe api return the error 
"java.io.IOException: Server returned HTTP response code: 403 for URL: 
http://boilerpipe-web.appspot.com/extract?url=http://www.davidicke.com/forum/sho
wthread.php?page%3D2%26t%3D72909&extractor=KeepEverythingExtractor&output=htmlFr
agment"
Help me! I do not use any proxy

Original issue reported on code.google.com by [email protected] on 28 Mar 2013 at 4:37

GoogleCodeExporter avatar Mar 24 '15 10:03 GoogleCodeExporter

i think the problem is because they do not use an user agent when asking for 
the html, and thus creates an error 403 in some websites, but you can try to 
download the html manually and then send that to the 
ArticleExtractor.INSTANCE.getText(String text) but i am not sure.

Original comment by [email protected] on 17 Aug 2013 at 12:35

GoogleCodeExporter avatar Mar 24 '15 10:03 GoogleCodeExporter