xmltodict icon indicating copy to clipboard operation
xmltodict copied to clipboard

Duplicate keys parsing

Open ashrafemad opened this issue 6 years ago • 5 comments

When i try to parse an xml tree, if there are many nodes with the same key for example: <Job> <x>job number 1</x> <y>job detils</y> </Job> <Job> <x>job number 2</x> <y>job detils number 2</y> </Job> it combines them into one Dict or OrderedDict with Job key is there anyway to get rid of this and use each item alone?

ashrafemad avatar Jan 22 '19 10:01 ashrafemad

The Perl version of this type of module will give you an array of these objects. The python version should return a list in this case.

snoopyjc avatar Jun 10 '19 03:06 snoopyjc

I found this combination to be unfavoured in some case due to the object getting a dynamic type , which could be OrderedDict or List depending on the context).

import xmltodict
import io
x = '''
<Body>
<Job> 
  <x>job number 1</x> 
  <y>job detils</y>
</Job> 
<Job> <x>job number 2</x> <y>job detils number 2</y> 
</Job>
</Body>
'''.strip()
print(type(xmltodict.parse(x)['Body']['Job']))
# <class 'list'>
x = '''
<Body>
<Job> 
  <x>job number 1</x> 
  <y>job detils</y>
</Job> 
</Body>
'''.strip()
print(type(xmltodict.parse(x)['Body']['Job']))
# <class 'collections.OrderedDict'>

I would suggest to control the behaviour using a keyword "merge_duplication".

The most simple solution would be forcing the type to be a list but this means we would have to write x['Body'][0]['Job'][0]['Name'][0] which is quite ugly (but useful and stable)..

shouldsee avatar Apr 03 '20 04:04 shouldsee

Ahh found an option called force_list

        If called with force_list=('interface',), it will produce
        this dictionary:
        {'servers':
          {'server':
            {'name': 'host1',
             'os': 'Linux'},
             'interfaces':
              {'interface':
                [ {'name': 'em0', 'ip_address': '10.0.0.1' } ] } } }
        `force_list` can also be a callable that receives `path`, `key` and
        `value`. This is helpful in cases where the logic that decides whether
        a list should be forced is more complex.

shouldsee avatar Apr 03 '20 04:04 shouldsee

Hey guys, do you have any suggestions to solve this issue? using a dictionary, it's not possible because the behavior it's like a hash_table.

edenilson-carvalho avatar May 17 '21 20:05 edenilson-carvalho

Hey guys, do you have any suggestions to solve this issue? using a dictionary, it's not possible because the behavior it's like a hash_table.

This has been discussed before, #14 has a lengthy discussion

If a node (in this example, Job) has a single child, then xmltodict will return a dict If a node has multiple children, xmltodict will return a list

The best way to normalise this behaviour would be the use of the force_list arg to make sure a list is always returned.

x = '''
<Body>
<Job> 
  <x>job number 1</x> 
  <y>job detils</y>
</Job> 
</Body>
'''.strip()
print(type(xmltodict.parse(x, force_list='Job')['Body']['Job']))
# <class 'list'>

SplinterHead avatar Feb 06 '23 16:02 SplinterHead