xmltodict icon indicating copy to clipboard operation
xmltodict copied to clipboard

Namespace attribute @xmlns with process_namespace=Tru

Open remia opened this issue 7 years ago • 3 comments

Hello,

I noticed a weird behavior with the last release 0.11.0 not present in 0.10.2, this shows up when parsing a simple xml with process_namespace option.

XML File

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<MyXML xmlns="http://www.xml.org/schemas/Test">
    <Tag1>Text1</Tag1>
    <Tag2 attr2="en">Text2</Tag2>
    <Tag3>Text3</Tag3>
    <Tag4 attr4="en">Text4</Tag4>
</MyXML>

Parser

import xmltodict
import json

def parse_xml(filename, force_list=None):
    with open(filename) as file:
        # Collapse these namespace
        namespaces = {
            "http://www.xml.org/schemas/Test": None,
        }
        res_dict = xmltodict.parse(
            file.read(),
            process_namespaces=True,
            namespaces=namespaces,
            force_list=force_list
        )
        res_dict = json.loads(json.dumps(res_dict))

    return res_dict

res_dict = parse_xml("xml_path")
print(res_dict)

With 0.10.2 release you get the following result as expected :

{u'MyXML': {u'Tag4': {u'@attr4': u'en', u'#text': u'Text4'}, u'Tag1': u'Text1', u'Tag2': {u'#text': u'Text2', u'@attr2': u'en'}, u'Tag3': u'Text3'}}

Instead in 0.11.0 you get this :

{u'MyXML': {u'Tag4': {u'@attr4': u'en', u'#text': u'Text4'}, u'Tag1': u'Text1', u'Tag2': {u'@xmlns': {u'': u'http://www.xml.org/schemas/Test'}, u'#text': u'Text2', u'@attr2': u'en'}, u'Tag3': u'Text3'}}

An attribute @xmlns is appended for the first xml tag in the file with an attribute, in this case Tag2 but if we remove the attribute for Tag2 in the file, the @xmlns will be present in Tag4.

Does that make sense ?

remia avatar Jul 25 '17 06:07 remia