Encoding Error ISO-8859-1 SimpleForm
XML String
data = "<?xml version='1.0' encoding='ISO-8859-1'?><OTA_HotelAvailRS xmlns=\"http://parsec.es/hotelapi/OTA2014Compact\" TimeStamp=\"2021-03-17T08:56:29Z\" PrimaryLangID=\"en-GB\" Id=\"11,33667649,72545\"><Hotels HotelCount=\"0\"><DateRange Start=\"2021-04-20\" End=\"2021-04-21\" /><RoomCandidates><RoomCandidate RPH=\"0\"><Guests><Guest AgeCode=\"A\" Count=\"2\" /></Guests></RoomCandidate></RoomCandidates></Hotels></OTA_HotelAvailRS>"
Here, the encoding value is ISO-8859-1.
Now, if I try run simple form it is giving me the following error
Error
iex(12)> Saxy.SimpleForm.parse_string(data)
{:error,
%Saxy.ParseError{
binary: "<?xml version='1.0' encoding='ISO-8859-1'?><OTA_HotelAvailRS xmlns=\"http://parsec.es/hotelapi/OTA2014Compact\" TimeStamp=\"2021-03-17T08:56:29Z\" PrimaryLangID=\"en-GB\" Id=\"11,33667649,72545\"><Hotels HotelCount=\"0\"><DateRange Start=\"2021-04-20\" End=\"2021-04-21\" /><RoomCandidates><RoomCandidate RPH=\"0\"><Guests><Guest AgeCode=\"A\" Count=\"2\" /></Guests></RoomCandidate></RoomCandidates></Hotels></OTA_HotelAvailRS>",
position: 30,
reason: {:invalid_encoding, "ISO-8859-1"}
}}
I tried to replace ISO-8859-1 with 'UTF-8' and it is working fine.
Is there a way to parse ISO-8859-1 encoded xml?
At the current state, Saxy only expects UTF-8 encoding, the parser will stop when the XML document explicitly says it's not UTF-8 encoded. Saxy very likely would not support other encodings.
If you are really sure that the input string is in UTF-8, for short term solution you could remove the encoding information before calling Saxy. For long term we could maybe provide an option to ignore encoding or override it to UTF-8, so the parser will continue even on unsupported encoding.
@qcam Thanks for the update and quick solution
Is it currently supported or not yet?