marcel icon indicating copy to clipboard operation
marcel copied to clipboard

Incorrect HTML magic identification when preceeded by a comment

Open markedmondson opened this issue 1 year ago • 0 comments

If the HTML has a comment before the opening tag, it is incorrectly identified as XML.

Steps to reproduce

io = StringIO.new(<<~HTML)
  <!--/* Throwaway comment but it has to be over 64 characters to fail AND have a uppercase HTML tag */-->
  <HTML>
    <head>
    </head>
    <body>
      <h1>Magic!</h1>
    </body
  </HTML>
HTML
Marcel::MimeType.for(io)
# => "application/xml"
io = StringIO.new(<<~HTML)
  <!--/* Throwaway comment but it has to be over 128 characters to fail AND have a lowercase HTML tag, we can pad this one out a bit to get it longer */-->
  <html>
    <head>
    </head>
    <body>
      <h1>Magic!</h1>
    </body
  </html>
HTML
Marcel::MimeType.for(io)
# => "application/xml"

Updating the magic definitions is a temporary workaround but obviously the comment could be any length, the broader lookup here https://github.com/rails/marcel/blob/main/lib/marcel/tables.rb#L2761 falls below the comment xml matching magic in https://github.com/rails/marcel/blob/main/lib/marcel/tables.rb#L2747.

Temporary workaround

Marcel::MimeType.extend "text/html", magic: [[0..256, "<HTML"]]
Marcel::MimeType.extend "text/html", magic: [[0..256, "<html"]]

markedmondson avatar Apr 04 '24 20:04 markedmondson