cannot parse pages
Hi,
Few of the pages i am trying to parse doesn't work
http://www.nepalnews.com/archive/2012/jan/jan05/news17.php http://nagariknews.com/society/crime/35123-2012-01-05-16-50-33.html
Thanks
Hi, I will try to follow up as soon as possible.
Thanks,
Hi,
I run the code on iphone simulator, there is nothing on the ui, and log is here:
GNU gdb 6.3.50-20050815 (Apple version gdb-1708) (Mon Aug 15 16:03:10 UTC 2011) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-apple-darwin".Attaching to process 771. Entity: line 99: parser error : StartTag: invalid element name </script> ^ Entity: line 100: parser error : StartTag: invalid element name " src="http://graphics8.nytimes.com/js/blogs_v3/nyt_universal/js/blogShare.js">< ^ Entity: line 106: parser error : StartTag: invalid element name </script> ^ Entity: line 107: parser error : StartTag: invalid element name " src="http://graphics8.nytimes.com/js/blogs_v3/nyt_universal/js/blogscrnr.js">< ^ Entity: line 107: parser error : StartTag: invalid element name ics8.nytimes.com/js/app/lib/jquery/jquery-1.6.2.min.js" type="text/javascript">< ^ Entity: line 107: parser error : StartTag: invalid element name aphics8.nytimes.com/js/EmbeddedComments/jquery.tmpl.js" type="text/javascript">< ^ Entity: line 108: parser error : StartTag: invalid element name ics8.nytimes.com/js/EmbeddedComments/commentsConfig.js" type="text/javascript">< ^ Entity: line 109: parser error : StartTag: invalid element name s8.nytimes.com/js/EmbeddedComments/embeddedComments.js" type="text/javascript">< ^ Entity: line 124: parser error : Entity 'raquo' not defined
I think this is because it's trying to parse HTML as XML as I'm hitting a lot of "Entity: nbsp not defined" errors, because it's not valid in XML.
Quite how to fix it, is another matter