xmltodict
xmltodict copied to clipboard
Duplicate keys parsing
When i try to parse an xml tree, if there are many nodes with the same key for example:
<Job> <x>job number 1</x> <y>job detils</y> </Job> <Job> <x>job number 2</x> <y>job detils number 2</y> </Job>
it combines them into one Dict
or OrderedDict
with Job
key
is there anyway to get rid of this and use each item alone?
The Perl version of this type of module will give you an array of these objects. The python version should return a list in this case.
I found this combination to be unfavoured in some case due to the object getting a dynamic type , which could be OrderedDict or List depending on the context).
import xmltodict
import io
x = '''
<Body>
<Job>
<x>job number 1</x>
<y>job detils</y>
</Job>
<Job> <x>job number 2</x> <y>job detils number 2</y>
</Job>
</Body>
'''.strip()
print(type(xmltodict.parse(x)['Body']['Job']))
# <class 'list'>
x = '''
<Body>
<Job>
<x>job number 1</x>
<y>job detils</y>
</Job>
</Body>
'''.strip()
print(type(xmltodict.parse(x)['Body']['Job']))
# <class 'collections.OrderedDict'>
I would suggest to control the behaviour using a keyword "merge_duplication".
The most simple solution would be forcing the type to be a list but this means we would have to write x['Body'][0]['Job'][0]['Name'][0]
which is quite ugly (but useful and stable)..
Ahh found an option called force_list
If called with force_list=('interface',), it will produce
this dictionary:
{'servers':
{'server':
{'name': 'host1',
'os': 'Linux'},
'interfaces':
{'interface':
[ {'name': 'em0', 'ip_address': '10.0.0.1' } ] } } }
`force_list` can also be a callable that receives `path`, `key` and
`value`. This is helpful in cases where the logic that decides whether
a list should be forced is more complex.
Hey guys, do you have any suggestions to solve this issue? using a dictionary, it's not possible because the behavior it's like a hash_table.
Hey guys, do you have any suggestions to solve this issue? using a dictionary, it's not possible because the behavior it's like a hash_table.
This has been discussed before, #14 has a lengthy discussion
If a node (in this example, Job
) has a single child, then xmltodict
will return a dict
If a node has multiple children, xmltodict
will return a list
The best way to normalise this behaviour would be the use of the force_list
arg to make sure a list is always returned.
x = '''
<Body>
<Job>
<x>job number 1</x>
<y>job detils</y>
</Job>
</Body>
'''.strip()
print(type(xmltodict.parse(x, force_list='Job')['Body']['Job']))
# <class 'list'>