junitparser icon indicating copy to clipboard operation
junitparser copied to clipboard

pythonic way to create dict

Open blackliner opened this issue 2 years ago • 8 comments

is there a simple way to get a python dict with all the information from the parsed xml? Eventually I want to dump is as a json, so it can be processed further.

Thanks for your amazing library!

blackliner avatar Mar 02 '22 21:03 blackliner

You could for example convert a TestSuite with it's TestCases into a list to access those TestCases in a pythonic way.

# Get the last TestCase's Name from the TestSuite
self.test_name = list(results.suite)[-1].name

ni-dschiller avatar Mar 03 '22 12:03 ni-dschiller

Why bothering parsing the XML with junitparser, why not converting XML directly to JSON?

EnricoMi avatar Mar 03 '22 12:03 EnricoMi

Here is a method that converts the parsed XML into a dict, inspired by https://stackoverflow.com/a/7684581/13070239.

Note that the dict does not preserve order:

def etree_to_dict(t):
    d = {t.tag: [etree_to_dict(c) for c in t]}
    d.update(('@' + k, v) for k, v in t.attrib.items())
    if t.text is not None:
        d['text'] = t.text
    return d

junit = JUnitXml.fromfile('junit.xml')
d = etree_to_dict(junit._elem)

EnricoMi avatar Mar 03 '22 12:03 EnricoMi

Thanks for the suggestions! Will try it out and give feedback 👍

blackliner avatar Mar 03 '22 16:03 blackliner

That works perfectly fine, thank you!

However, it's not super elegant to pass a private member into this free function. Would it make sense that class JunitXml/TestSuite provides member function to_dict() that implements etree_to_dict(t)?

cbachhuber avatar Mar 03 '22 20:03 cbachhuber

Sounds reasonable, @weiwei, what do you think?

EnricoMi avatar Mar 03 '22 20:03 EnricoMi

I'm not in favor of the idea because

  1. The result dict or json doesn't seem to follow any standard format (that I know of, but point me out if I'm wrong).
  2. It seems to be a marginal use case and the workaround is pretty straightforward.

weiwei avatar Mar 14 '22 09:03 weiwei

I see what you mean, but a dict is a well-established data structure in python, hence it is much more accessible and interoperable with other python tools. In my specific use case, I found it hard to check TestSuite's contents with PyCharm's debugger, with a dict, that's much simpler. Additionally, I want to ship the data with filebeat, which likes json files. I expect many more use cases that we don't even recognize yet.

Regarding the standard format: I found the output from the function above a bit weird as well, thus I adopted to

def element_tree_to_dictionary(element_tree) -> Dict:
    dictionary = {element_tree.tag: [element_tree_to_dictionary(c) for c in element_tree]}
    dictionary.update((key, value) for key, value in element_tree.attrib.items())
    if element_tree.text is not None and element_tree.text.strip():
        dictionary["text"] = element_tree.text.lstrip("=")
    return dictionary

which gives e.g.

{
  "testsuite": [
    {
      "testcase": [{"system-out": [], "text": "\nAll tests passed (4 assertions in 1 test case)"}],
      "name": "chamfer_distance_test",
      "classname": "chamfer_distance_test",
      "time": "0.0052075",
      "status": "run",
    },
    {
      "testcase": [{"system-out": [], "text": "\nAll tests passed (39 assertions in 2 test cases)"}],
      "name": "subdivision_test",
      "classname": "subdivision_test",
      "time": "0.0098322",
      "status": "run",
    },
    {
      "testcase": [{"system-out": [], "text": "\nAll tests passed (155 assertions in 2 test cases)"}],
      "name": "quaternion_test",
      "classname": "quaternion_test",
      "time": "0.0050072",
      "status": "run",
    },
    {
      "testcase": [{"system-out": [], "text": "\nAll tests passed (3 assertions in 3 test cases)"}],
      "name": "oriented_bounding_box_test",
      "classname": "oriented_bounding_box_test",
      "time": "0.0111813",
      "status": "run",
    },
  ]
  "name": "Linux-g++-7",
  "tests": "4",
  "failures": "0",
  "disabled": "0",
  "skipped": "0",
  "hostname": "cbachhuber-lt",
  "time": "0.031",
  "timestamp": "2022-02-11T14:00:06",
}

Which I find a very useful output! Not sure how standard it is, though.

cbachhuber avatar Mar 15 '22 10:03 cbachhuber