saxy icon indicating copy to clipboard operation
saxy copied to clipboard

Stream fails to parse if XML file contains a leading BOM character

Open edds opened this issue 1 year ago • 1 comments

If an XML file contains a leading BOM Saxy fails to parse the file.

iex(1)> xml = "\uFEFF<?xml version=\"1.0\" encoding=\"utf-8\"><foo bar='value'></foo>"
iex(2)> Saxy.parse_string(data, MyEventHandler, [])
"Start parsing document"
{:error,
%Saxy.ParseError{
  reason: {:token, :lt},
  binary: "\uFEFF<?xml version=\"1.0\" encoding=\"utf-8\"><foo bar='value'></foo>",
  position: 0
}}

I'm seeing this when using ExCmd to stream a gziped file into Saxy and can't see any obvious way of stripping it out before passing the stream to Saxy.

edds avatar Mar 01 '23 11:03 edds

Yes, this happens. You can use String.replace_leading(xml, "\uFEFF", "") to strip the BOM.

lucacorti avatar Feb 28 '24 19:02 lucacorti