boilerpipe icon indicating copy to clipboard operation
boilerpipe copied to clipboard

Bad xml format in html output from Web API

Open GoogleCodeExporter opened this issue 9 years ago • 1 comments

• What steps will reproduce the problem?
Get an html or htmlFragment from any page

• What is the expected output? What do you see instead?
The output have an xml declaration, but instead of a valid html/xml structure 
there are extra tags that break the xml:

<?xml version="1.0" encoding="utf-8" ?>
<meta …/>
<base … />
<html>
  <body>
    ...
  </body>
</html>

And in the <html> the style comes directly after the <html> and not in a <head>.

The correct output would be:

<?xml version="1.0" encoding="utf-8" ?>
<html>
  <head>
    <meta …/>
    <base … />
    <style>...</style>
  </head>
  <body>
    ...
  </body>
</html>

• What version of the product are you using? On what operating system?

The Web API http://boilerpipe-web.appspot.com/extract

And thanks for this great *GREAT* tool!!!

--
François

Original issue reported on code.google.com by [email protected] on 3 Dec 2011 at 4:13

GoogleCodeExporter avatar Mar 24 '15 10:03 GoogleCodeExporter

Hi François,

thanks for pointing this out.

The addition of meta and base was a deliberate decision (it was just easier to 
append it in front of the highlighted HTML). Nevertheless, it is worth fixing.

Cheers,
Christian

Original comment by ckkohl79 on 22 Jan 2012 at 10:57

  • Changed state: Accepted
  • Added labels: Type-Enhancement, Priority-Low
  • Removed labels: Type-Defect, Priority-Medium

GoogleCodeExporter avatar Mar 24 '15 10:03 GoogleCodeExporter