xmltodict icon indicating copy to clipboard operation
xmltodict copied to clipboard

Is there a bug in version 0.13.0?

Open solaim opened this issue 2 years ago • 1 comments

XML size is about >= 20M.

Use version 0.12.0: parse ok, data ok. Use version 0.13.0: parse ok, but some data is dropped.

I know there are something, but I do not know what happened.

so I downgrade to version 0.12.0, everything is ok.

solaim avatar May 27 '22 10:05 solaim

It’s impossible to reliably reproduce an issue if you don’t provide a minimal example. Have you tried reducing the XML to see if it’s correctly parsed?

Edit: I just a created an XML file of 121MB and got no issue parsing it and unparsing it:

t=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
echo '<?xml version="1.0" encoding="utf-8"?>' > a.xml
echo '<a>' >> a.xml
for _ in {1..1000000}; do
  echo "<$t>$t</$t>" >> a.xml
done
echo -n '</a>' >> a.xml
with open("a.xml", "rb") as f:
    x = xmltodict.parse(f)

print(len(x["a"]["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"])) # 1000000

with open("b.xml", "wb") as f:
    xmltodict.unparse(x, output=f, pretty=True, indent="")
$ shasum -a 256 a.xml b.xml
cb7028e5d0bbb62b296e8b53d543eb53248208365c0e7de41090d6911e0aa9dd  a.xml
cb7028e5d0bbb62b296e8b53d543eb53248208365c0e7de41090d6911e0aa9dd  b.xml

Edit 2: no issue either with a 18GB file containing 120,000,000 elements (in streaming mode).

bfontaine avatar Jul 07 '23 16:07 bfontaine