Introduce xml_safe as a module to control xml opearations
This is related to CVE-2017-11427 and VU#475445 Related issues: #496, #497 Reported by duo through this blog post
pysaml2 is not affected, as, by default, the xml.etree.ElementTree and xml.etree.cElementTree parsers ignore comment nodes. However, this commit makes sure that the ElementTree being used is set correctly through defusexml lib and centralizes the control of which functions are exposed and available for usage in the code. Any code that needs a function to parse, modify or serialize XML should be obtained through the xml_safe module. The new module asks defusexml to provide the function and if it is not available it will fallback to the one provided by xml.etree.cElementTree. This is a guarantee that functions like parse, fromstring et al are provided by defusexml lib.
All Submissions:
- [x] Have you checked to ensure there aren't other open Pull Requests for the same update/change?
- [ ] Have you added an explanation of what problem you are trying to solve with this PR?
- [ ] Have you added information on what your changes do and why you chose this as your solution?
- [ ] Have you written new tests for your changes?
- [x] Does your submission pass tests?
- [x] This project follows PEP8 style guide. Have you run your code against the 'flake8' linter?
I am putting this here mostly to get feedback and I will soon add tests and reformat this PR to match the problem/solution format that the template suggests.
Would it be worth printing a warning if defusedxml isn't used? EDIT: Whoops, never mind
I wonder, why this PR still not merged?
Caught up into other aspects. This is not ready, it's just a start. Ideally we should not need to rely on defusedxml.
And what alternative can you propose?
Why is defusedxml not an option? It addresses several security concerns present in python's standard parser. I think falling back onto standard xml or another library is a good option, but I still think using defusedxml by default is preferable.
defusedxml is the current choice and works with the builtin xml parser. Going forward, we will most probably switch to lxml and want to have control over the XML parser behaviour. Needed security options from defusedxml will be incorporated. However, defusedxml doesn't seem to be very active and the lxml wrapper has known issues - also see tiran/defusedxml#15