cpython icon indicating copy to clipboard operation
cpython copied to clipboard

gh-135640: Adds type checking to ElementTree.ElementTree constructor

Open abstractedfox opened this issue 6 months ago • 11 comments

Addresses issue #135640 by adding type checking to the constructor and to ElementTree._setroot, since both have the opportunity to assign _root. In addition to addressing the primary issue (possible data loss), this makes relevant tracebacks more descriptive.

It looks like there had once been asserts that did these checks. I couldn't find rationale for why they were commented out in blame (it looks like this was done on svn, and I'm not sure where to look to find those commit messages). I opted to use raise instead as I felt it would be more descriptive.

At present, the way iselement is written still allows an object of the wrong type to pass through if it has a tag attribute. Commit a72a98f shows there were once comments indicating that the current implementation was done for a reason, but it isn't clear to me why that is. I left it alone for now, but changing it to this:

def iselement(element):
    return isinstance(element, Element)

...also passed tests. If there's no disagreement, I think it would be good to change that too.

  • Issue: gh-135640

abstractedfox avatar Jun 17 '25 20:06 abstractedfox

All commit authors signed the Contributor License Agreement.

CLA signed

python-cla-bot[bot] avatar Jun 17 '25 20:06 python-cla-bot[bot]

Code changes and test look good to me. But IMO, it's better to provide some news entry with blurb tool, since it slightly changes code behavior.

And I'd suggest to add test on _setroot function too.

efimov-mikhail avatar Jun 18 '25 20:06 efimov-mikhail

@abstractedfox: Can you add a NEWS entry? You can also try https://blurb-it.herokuapp.com/

vstinner avatar Jun 19 '25 09:06 vstinner

iselement() was initially only used in assertions, and they were commented out in 3e8c189faae661d44c61839986614fce595fc404 (bpo-6472/gh-50721). I guess the reason of both was performance. Many of assertions were replaced with _assert_is_element() in 396e8fcf36480dacaeeeb785a3912a565294d3b7 (bpo-13782/gh-57991).

It is unlikely the the ElementTree constructor is used in tight loops, so restoring check for it should not hurm performance. Although, this is not the only way to set the root element -- in parse() it is set to what the parser returned.

If you are going to make the check more strict, note that other places require more Element methods: find(), findtext(), findall(), iterfind(). Which is write() more important than findtext()? But adding more attribute checks will make iselement() more slow and can break code which only uses the part of the Element interface. Even the current version of the PR can break a working code that does not need to call write().

serhiy-storchaka avatar Jun 22 '25 09:06 serhiy-storchaka

Actually, the reason I encountered the original issue was because find() worked with a root object that was the wrong type; I'd passed the constructor an ElementTree.ElementTree instead of a root element, and was using .find() to interface with it (was expecting Element.find(), but they behaved the same way in my code), so it worked as normal until .write() was called. In my particular case, this pattern only happened when opening an existing file, so at that point the file would be destroyed.

I suppose the question now is, is this type of check the correct way to handle this? My opinion at the beginning was that the ElementTree class shouldn't demonstrate behavior that works sometimes and might break in an undocumented way when used with objects of a correct-ish type, but if it needs to support this type of usage (ie, to avoid breaking code that already depends on this) perhaps .write() itself should check for its minimum required attributes, and everything else should be left as it was? What do you think?

abstractedfox avatar Jun 22 '25 17:06 abstractedfox

Note that the ElementTree constructor accepts also element=None (and it is default), but most of its methods don't work in this case. There are comment-out # assert self._root is not None in these methods, but they are not particularly useful -- they would simply change an AttributeError to AssertionError.

I suggest to leave iselement() as is for now. Testing only the "tag" attribute will catch most errors and reduce the risk of breaking existing code.

If something does not work -- user will get error in any case. It is a programming error, not an input data error, so its type is less important -- it is not supposed to be caught.

serhiy-storchaka avatar Jun 23 '25 15:06 serhiy-storchaka

Could we at least put these checks in write itself so it will fail a little more gracefully in that condition? It's going to error in both cases, but it still feels less than ideal for it to damage a file opened for writing, especially if it would take a relatively small change to prevent that

abstractedfox avatar Jun 23 '25 16:06 abstractedfox

If the PR changes iselement(), I don't think that it's safe to backport it to stable branches (3.13 and 3.14).

vstinner avatar Jun 24 '25 16:06 vstinner

At this point I suppose I'd like some guidance on which path is most acceptable; put the PR back in its original state (iselement unaltered, keep the type check I added; allows element-like objects but could still enable edge cases with .write()), leave constructor entirely untouched and add the checks to .write() itself (would not break any code that wasn't broken already, but prevents it from damaging an existing file), or change iselement to be more strict (seems like we don't want to do this for aforementioned reasons, which I agree with)

abstractedfox avatar Jun 24 '25 18:06 abstractedfox

I would prefer to leave iselement() as it is:

def iselement(element):
    """Return True if *element* appears to be an Element."""
    return hasattr(element, 'tag')

And add iselement() checks in write() and _setroot().

vstinner avatar Jun 26 '25 13:06 vstinner

@serhiy-storchaka hmm I see, do you suggest doing something differently overall? It does narrow the opportunities for the initial problem in a fairly low-overhead way in its present state, which I feel is an improvement

abstractedfox avatar Jun 27 '25 17:06 abstractedfox

LGTM. But it's better to add test for _setroot with ElementLike, IMO.

efimov-mikhail avatar Jul 01 '25 16:07 efimov-mikhail

Most likely reason of raising this exception is if the user forget to call parse(). Error message is wrong in that case.

Other possible cases -- when the parser returned wrong object or user set the _root attribute (omitting _setroot()) are much less common, and there is no reason to check this in write().

So the code raises wrong exception in most common case and raises unnecessary exception in less common cases. I suggest to remove check in write(). Or replace it with a check for self._root is None with relevant error message (uninitialized ElementTree or missed parse() call).

serhiy-storchaka avatar Jul 01 '25 16:07 serhiy-storchaka

Thanks @abstractedfox for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14. 🐍🍒⛏🤖

miss-islington-app[bot] avatar Jul 03 '25 07:07 miss-islington-app[bot]

GH-136225 is a backport of this pull request to the 3.14 branch.

bedevere-app[bot] avatar Jul 03 '25 07:07 bedevere-app[bot]

GH-136226 is a backport of this pull request to the 3.13 branch.

bedevere-app[bot] avatar Jul 03 '25 07:07 bedevere-app[bot]

Thanks for the PR, @abstractedfox !

efimov-mikhail avatar Jul 03 '25 08:07 efimov-mikhail