iPhone-Readability-Parsing icon indicating copy to clipboard operation
iPhone-Readability-Parsing copied to clipboard

cannot parse pages

Open narup opened this issue 13 years ago • 3 comments

Hi,

Few of the pages i am trying to parse doesn't work

http://www.nepalnews.com/archive/2012/jan/jan05/news17.php http://nagariknews.com/society/crime/35123-2012-01-05-16-50-33.html

Thanks

narup avatar Jan 05 '12 20:01 narup

Hi, I will try to follow up as soon as possible.

Thanks,

vodkhang avatar Jan 27 '12 10:01 vodkhang

Hi,

I run the code on iphone simulator, there is nothing on the ui, and log is here:

GNU gdb 6.3.50-20050815 (Apple version gdb-1708) (Mon Aug 15 16:03:10 UTC 2011) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-apple-darwin".Attaching to process 771. Entity: line 99: parser error : StartTag: invalid element name </script> ^ Entity: line 100: parser error : StartTag: invalid element name " src="http://graphics8.nytimes.com/js/blogs_v3/nyt_universal/js/blogShare.js">< ^ Entity: line 106: parser error : StartTag: invalid element name </script> ^ Entity: line 107: parser error : StartTag: invalid element name " src="http://graphics8.nytimes.com/js/blogs_v3/nyt_universal/js/blogscrnr.js">< ^ Entity: line 107: parser error : StartTag: invalid element name ics8.nytimes.com/js/app/lib/jquery/jquery-1.6.2.min.js" type="text/javascript">< ^ Entity: line 107: parser error : StartTag: invalid element name aphics8.nytimes.com/js/EmbeddedComments/jquery.tmpl.js" type="text/javascript">< ^ Entity: line 108: parser error : StartTag: invalid element name ics8.nytimes.com/js/EmbeddedComments/commentsConfig.js" type="text/javascript">< ^ Entity: line 109: parser error : StartTag: invalid element name s8.nytimes.com/js/EmbeddedComments/embeddedComments.js" type="text/javascript">< ^ Entity: line 124: parser error : Entity 'raquo' not defined

Fredlee001 avatar May 08 '12 05:05 Fredlee001

I think this is because it's trying to parse HTML as XML as I'm hitting a lot of "Entity: nbsp not defined" errors, because it's not valid in XML.

Quite how to fix it, is another matter

evamedia avatar Sep 12 '12 13:09 evamedia