django-rest-framework-xml icon indicating copy to clipboard operation
django-rest-framework-xml copied to clipboard

added functionality to correctly parse xml lists

Open Charlemagne3 opened this issue 8 years ago • 4 comments

The problem

Hello, I am using this library to parse XML in Django REST, and unfortunately, the XML I am parsing is third party, so I cannot use the "list-item" designator.

The solution

I added a function _check_xml_list to check if an XML node represents a list of items. This replaces the check for children[0].tag == "list-item", allowing any list of child nodes that have identical names to be parsed as a list.

The way the function works is that it checks that

  1. the element has more than one child node
  2. the child nodes all have the same name

If they do not have the same name, then it is treated as a normal dictionary element, and duplicate nodes will be overwritten as before.

Testing

I added tests to check both valid and invalid lists nodes, which check the behavior described above.

The unit tests would not run with pytest-django 2.6; the error I received was:

Failed: Database access not allowed, use the "django_db" mark to enable

Upgrading to pytest-django to the latest, 2.9.1, solved this problem.

I am also using Django 1.9, and unittest is no longer a package in django.utils, so I removed the references to it. (This made the linter mad, so I reverted this in https://github.com/jpadilla/django-rest-framework-xml/pull/15/commits/9a24ffd0d7455337268fe7ed83c71044663dd883 and added the skipUnless checks to my two test methods)

I can remove those changes if desired, my main interest is getting this feature pulled in, and I needed to be able to run the tests to make sure that they passed.

Related issues

Here are related issues and a closed PR:

https://github.com/jpadilla/django-rest-framework-xml/issues/5 https://github.com/jpadilla/django-rest-framework-xml/issues/11 https://github.com/jpadilla/django-rest-framework-xml/pull/12

Charlemagne3 avatar Apr 18 '16 22:04 Charlemagne3

For anyone waiting this is a good workaround to use this patch right now.

from rest_framework_xml.parsers import XMLParser


class ListXMLParser(XMLParser):
    def _check_xml_list(self, element):
        return len(element) > 1 and len(set([child.tag for child in element])) <= 1


    def _xml_convert(self, element):
        children = list(element)

        if len(children) == 0:
            return self._type_convert(element.text)
        else:
            # if the fist child tag is list-item means all children are list-item
            if self._check_xml_list(element):
                data = []
                for child in children:
                    data.append(self._xml_convert(child))
            else:
                data = {}
                for child in children:
                    data[child.tag] = self._xml_convert(child)

        return data

bufke avatar Jul 25 '16 18:07 bufke

@Charlemagne3 - sorry for the delay on this. I think this is a great idea.

Still looking at the PR but I noticed there is no documentation update. I think this should get some documentation change and then we can look at merging this.

Perhaps consider adding some info on this into the parsers page of the documentation.

kyleobrien91 avatar Mar 10 '17 08:03 kyleobrien91

I'm not sure this is the way to go. I'm also facing this issue, and I tested this solution. This solution has no way to recognize a list with just one element, which the original implementation with the hard-coded item name (list-item) does get correctly.

Check this example that clarifies my point:

<?xml version="1.0" encoding="utf-8"?>
<root>
  <field_a>
    <list-item>foo</list-item>
  </field_a>
  <field_b>
    <list-item>foo</list-item>
    <list-item>bar</list-item>
  </field_b>  
</root>
{
    "field_a": {
        "list-item": "foo"
    },
    "field_b": ["foo", "bar"]
}

Even though the XML is formatted in a structured way, the resulting dict is not easy to predict. And you can't demand of your users that lists should always contain two or more items. At least, I can't :).

gertjanol avatar Dec 11 '17 15:12 gertjanol

@gertjanol

This does the job for me

class ListXMLParser(XMLParser):
    list_tags = [
        'Record',
        'BulletPoint',
        'Job'
    ]

    def _check_xml_list(self, element):
        return len(element) > 1 >= len(set([child.tag for child in element]))

    def _xml_convert(self, element):
        children = list(element)

        if len(children) == 0:
            return self._type_convert(element.text)
        else:
            if self._check_xml_list(element) or children[0].tag in self.list_tags:
                data = []
                for child in children:
                    data.append(self._xml_convert(child))
            else:
                data = {}
                for child in children:
                    data[child.tag] = self._xml_convert(child)
        return data
<ns0:my_xml_service>
     <Records>
	   <Record>
	      <InterfaceType>Test</InterfaceType>
	      <TypeID>testid</TypeID>
	      <Identifier>testidentifier</Identifier>
	      <Status>teststatus</Status>
	      <ResendType>testtype</ResendType>
	      <ResendTypeID>testtypeid</ResendTypeID>
	   </Record>
	   <Record>
	      <InterfaceType>TestB</InterfaceType>
	      <TypeID>testidB</TypeID>
	      <Identifier>testidentifierB</Identifier>
	      <Status>teststatusB</Status>
	      <ResendType>testtypeB</ResendType>
	      <ResendTypeID>testtypeidB</ResendTypeID>
	   </Record>
     </Records>
</ns0:my_xml_service>

Becomes

{
	'Records': [{
		'InterfaceType': 'Test',
		'TypeID': 'testid',
		'Identifier': 'testidentifier',
		'Status': 'teststatus',
		'ResendType': 'testtype',
		'ResendTypeID': 'testtypeid'
	}, {
		'InterfaceType': 'TestB',
		'TypeID': 'testidB',
		'Identifier': 'testidentifierB',
		'Status': 'teststatusB',
		'ResendType': 'testtypeB',
		'ResendTypeID': 'testtypeidB'
	}]
}

Because my_xml_service contains two items

And this:

<ns0:my_xml_service>
	<Records>
	   <Record>
	      <InterfaceType>Test</InterfaceType>
	      <TypeID>testid</TypeID>
	      <Identifier>testidentifier</Identifier>
	      <Status>teststatus</Status>
	      <ResendType>testtype</ResendType>
	      <ResendTypeID>testtypeid</ResendTypeID>
	   </Record>
   </Records>
</ns0:my_xml_service>

Becomes

{
	'Records': [{
		'InterfaceType': 'Test',
		'TypeID': 'testid',
		'Identifier': 'testidentifier',
		'Status': 'teststatus',
		'ResendType': 'testtype',
		'ResendTypeID': 'testtypeid'
	}]
}

Because Record is in the node name list

miniyuuzhan avatar Dec 14 '17 02:12 miniyuuzhan