gh-135640: Adds type checking to ElementTree.ElementTree constructor
Addresses issue #135640 by adding type checking to the constructor and to ElementTree._setroot, since both have the opportunity to assign _root. In addition to addressing the primary issue (possible data loss), this makes relevant tracebacks more descriptive.
It looks like there had once been asserts that did these checks. I couldn't find rationale for why they were commented out in blame (it looks like this was done on svn, and I'm not sure where to look to find those commit messages). I opted to use raise instead as I felt it would be more descriptive.
At present, the way iselement is written still allows an object of the wrong type to pass through if it has a tag attribute. Commit a72a98f shows there were once comments indicating that the current implementation was done for a reason, but it isn't clear to me why that is. I left it alone for now, but changing it to this:
def iselement(element):
return isinstance(element, Element)
...also passed tests. If there's no disagreement, I think it would be good to change that too.
- Issue: gh-135640
Code changes and test look good to me.
But IMO, it's better to provide some news entry with blurb tool, since it slightly changes code behavior.
And I'd suggest to add test on _setroot function too.
@abstractedfox: Can you add a NEWS entry? You can also try https://blurb-it.herokuapp.com/
iselement() was initially only used in assertions, and they were commented out in 3e8c189faae661d44c61839986614fce595fc404 (bpo-6472/gh-50721). I guess the reason of both was performance. Many of assertions were replaced with _assert_is_element() in 396e8fcf36480dacaeeeb785a3912a565294d3b7 (bpo-13782/gh-57991).
It is unlikely the the ElementTree constructor is used in tight loops, so restoring check for it should not hurm performance. Although, this is not the only way to set the root element -- in parse() it is set to what the parser returned.
If you are going to make the check more strict, note that other places require more Element methods: find(), findtext(), findall(), iterfind(). Which is write() more important than findtext()? But adding more attribute checks will make iselement() more slow and can break code which only uses the part of the Element interface. Even the current version of the PR can break a working code that does not need to call write().
Actually, the reason I encountered the original issue was because find() worked with a root object that was the wrong type; I'd passed the constructor an ElementTree.ElementTree instead of a root element, and was using .find() to interface with it (was expecting Element.find(), but they behaved the same way in my code), so it worked as normal until .write() was called. In my particular case, this pattern only happened when opening an existing file, so at that point the file would be destroyed.
I suppose the question now is, is this type of check the correct way to handle this? My opinion at the beginning was that the ElementTree class shouldn't demonstrate behavior that works sometimes and might break in an undocumented way when used with objects of a correct-ish type, but if it needs to support this type of usage (ie, to avoid breaking code that already depends on this) perhaps .write() itself should check for its minimum required attributes, and everything else should be left as it was? What do you think?
Note that the ElementTree constructor accepts also element=None (and it is default), but most of its methods don't work in this case. There are comment-out # assert self._root is not None in these methods, but they are not particularly useful -- they would simply change an AttributeError to AssertionError.
I suggest to leave iselement() as is for now. Testing only the "tag" attribute will catch most errors and reduce the risk of breaking existing code.
If something does not work -- user will get error in any case. It is a programming error, not an input data error, so its type is less important -- it is not supposed to be caught.
Could we at least put these checks in write itself so it will fail a little more gracefully in that condition? It's going to error in both cases, but it still feels less than ideal for it to damage a file opened for writing, especially if it would take a relatively small change to prevent that
If the PR changes iselement(), I don't think that it's safe to backport it to stable branches (3.13 and 3.14).
At this point I suppose I'd like some guidance on which path is most acceptable; put the PR back in its original state (iselement unaltered, keep the type check I added; allows element-like objects but could still enable edge cases with .write()), leave constructor entirely untouched and add the checks to .write() itself (would not break any code that wasn't broken already, but prevents it from damaging an existing file), or change iselement to be more strict (seems like we don't want to do this for aforementioned reasons, which I agree with)
I would prefer to leave iselement() as it is:
def iselement(element):
"""Return True if *element* appears to be an Element."""
return hasattr(element, 'tag')
And add iselement() checks in write() and _setroot().
@serhiy-storchaka hmm I see, do you suggest doing something differently overall? It does narrow the opportunities for the initial problem in a fairly low-overhead way in its present state, which I feel is an improvement
LGTM. But it's better to add test for _setroot with ElementLike, IMO.
Most likely reason of raising this exception is if the user forget to call parse(). Error message is wrong in that case.
Other possible cases -- when the parser returned wrong object or user set the _root attribute (omitting _setroot()) are much less common, and there is no reason to check this in write().
So the code raises wrong exception in most common case and raises unnecessary exception in less common cases. I suggest to remove check in write(). Or replace it with a check for self._root is None with relevant error message (uninitialized ElementTree or missed parse() call).
Thanks @abstractedfox for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14. 🐍🍒⛏🤖
GH-136225 is a backport of this pull request to the 3.14 branch.
GH-136226 is a backport of this pull request to the 3.13 branch.
Thanks for the PR, @abstractedfox !