Google Code Exporter comments

Results 21344 comments of


                                            Google Code Exporter

Can not parse NYtimes pages

``` Any change on this issue? I am seeing the same thing with parsing NYT pages for my application. I think this might be related to the fact that NYT...

Incorrect characters in Extractor output

``` ..and I'm using boilerpipe 1.2.0 ``` Original comment by `[email protected]` on 31 Jul 2012 at 3:39

Incorrect characters in Extractor output

``` Hello, did you manage to solve it on your own? ``` Original comment by `[email protected]` on 10 Sep 2012 at 4:08

Incorrect characters in Extractor output

``` Hello, not really. I use php to analyze the output of boilerpipe, and estimate the charset, but the ideal case would be if I wouldn't have to do that....

Incorrect characters in Extractor output

``` Found the solution: Here is the java code needed to fix the special charaters issue: public class ExtractMe { public static void main(final String[] args) throws Exception { BufferedReader...

StackOverflowError when page includes another <body> part in <noframes>

``` Thanks for reporting. This seems to be caused by a bug in NekoHTML 1.9.13 The corresponding stacktrace points at "org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1003)" The problem seems to go away after an update...

StackOverflowError when page includes another <body> part in <noframes>

``` Thanks for quick-response. As you've stated, the problem has gone away with NekoHTML 1.9.15. Below is the list of changes in NekoHTML since ver.1.9.13 (which has been released on...

Library does not produce same results as http://boilerpipe-web.appspot.com/

``` It looks like the issue is the KeepLargestBlockFilter which rejects every block except the largest. While taking out this filter in the library should return results closer to http://boilerpipe-web.appspot.com/,...

Library does not produce same results as http://boilerpipe-web.appspot.com/

``` I'm also unable to get the same results using the HTMLHighlighter in extraction mode. The web API (http://boilerpipe-web.appspot.com) clearly states that: "This Web Application probably uses a more recent...

Library does not produce same results as http://boilerpipe-web.appspot.com/

``` I got the similar issue. When trying the URL "http://www.hokkaido-np.co.jp/news/donai/424760.html" With ArticleExtractor and "Plain Text" output Library code did not produce same results as http://boilerpipe-web.appspot.com/ ``` Original comment by...

‹
1
2
...
106
107
108
109
110
111
112
...
2134
2135
›