astropy
astropy copied to clipboard
Avoid using `classproperty` in unit formatters
Description
Trying to check our unit formatter code with Mypy results in a large number of errors that look like
astropy/units/format/ogip.py:69: error: Missing positional argument "fget" in call to "classproperty" [call-arg]
astropy/units/format/ogip.py:69: error: "classproperty" not callable [operator]
This because our @classproperty is too dynamic for type checkers. Luckily the unit formatters don't really need @classproperty because very similar functionality can be achieved with standard library @classmethod and @functools.cache, which don't cause problems for type checkers. Replacing our custom utilities with standard library ones is a good thing to do independently of type checking.
I'm opening this as a draft where I have replaced just one class property. If the maintainers think this is a good idea then I can replace all of them.
- [x] By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
- [ ] Do the proposed changes actually accomplish desired goals?
- [ ] Do the proposed changes follow the Astropy coding guidelines?
- [ ] Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
- [ ] Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
- [ ] Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
- [ ] Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
- [ ] Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
- [ ] Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
- [ ] At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.
👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?
Here we do want caching.
Yes. I was posting for consideration of static typing.
The changes I am proposing are indeed clumsy, but the formatters are fundamentally badly designed and changing that would be too disruptive for downstream developers. If we want to compile the code with Mypyc then Mypy has to approve it first and such awkward hacks will be needed for that.
It would be very nice if we could implement classproperty in a type-checker friendly way, but as far as I know descriptors are too dynamic for static type checkers and the common standard library descriptors are supported only because of hardcoded exceptions in type checking code, which don't apply to user-defined descriptors. The class @nstarman mentioned contains more than one # type: ignore[type-arg] instruction, so that doesn't give me any hope that we could compile anything based on that with Mypyc.
I feel this is just working around limitations of something (the typing system) that is still in flux, and I don't really see the point of doing that
I feel the opposite way. Typed-Python is intentionally restrictive and creates pressure on design patterns (which is a good thing). As such, it may never support everything that Yolo-Python (Python without type checkers) allows. I believe that constraining our designs such that they are type-checker friendly makes for more robust code. Of course refactoring on the scale of a large library like astropy is extremely challenging, and I don't know that we can realistically hope for it to happen, but I do think we should try to leverage existing static-analysis tools (mypy) now rather than waiting for them to support extremely dynamic patterns, which may never happen, realistically.
I'm generally +1 on this initiative, especially when the diff count is so low.
My other worry beyond disliking changing-for-the-sake-of-mypy is that _units is one thing that someone making their own subclass (if it were at all a serious one that did something interesting) would have defined. So, I do not see how this change cannot break their work. Of course, it may well be that the previous changes would have done that too, but I think those were much less likely to have done so. And of course it is also eminently possible that no-one ever created a new format class.
More prosaically, is it even a problem if mypy cannot compile this? It is not as if this piece is at all important for performance: as long as mypy knows it gives a dict, I don't really see the problem.
I do not see how this change cannot break their work
That is worrying indeed. Perhaps should ask on astropy communication channels, just in case. After all, we just got bitten by LSST using Lupton RGB feature that we thought was okay to remove...
@eerovaher do you know what mypy thinks about a subclass of property? As in can we sidestep the issue, though incurring the drawback of having extraneous API methods like .setter that might need to raise errors.
@mhvk wrote
... It would be much better ... to improve CPython itself (by introducing a
classpropertythere ...
Python 3.9 made it possible to stack @classmethod and @property, but that was deprecated in Python 3.11 and apparently it's an error in Python 3.13. So I don't think its reasonable to expect classproperty being implemented in the standard library any time soon.
I've updated this pull request with a different proof of concept. In Python classes are instances of metaclasses, so if we define for every formatter class (that uses our classproperty) a corresponding metaclass then we can replace what are currently class properties on the formatter classes with normal properties on the metaclasses. This allows us to avoid using our classproperty in the unit formatters, makes the formatters understandable by type checkers (I have verified it with Mypy) and avoids breaking private API (even though we should be allowed to break private API whenever we want or else our private API is actually public API).
The class @nstarman https://github.com/astropy/astropy/pull/17652#pullrequestreview-2565818664 contains more than one # type: ignore[type-arg] instruction, so that doesn't give me any hope that we could compile anything based on that with Mypyc.
I think those type ignores are actually because of beartype<v20, which didn't like parametrized class method. This is solved in beartype>v20. I do think we can make a myyc-compatible classproperty descriptor.
No custom metaclasses and associated inheritance issues required.
The fundamental problem with class properties (both our classproperty and the classmethod and property combination that standard library briefly recommended) is that descriptors and classes do not interact like descriptors and instances do. There are two problems that I learned about when I investigated this.
- Calling
help()on a class with class properties executes the properties. To see that you can addprint()calls toFITS._units()and see that callinghelp(FITS)does indeed print your message. Alternatively, you can add atime.sleep()call toFITS._units()and see how long it takes for the help text to appear after callinghelp(FITS). Do keep in mind thatFITS._units()is cached, therefore it will only be executed the first timehelp(FITS)is called. My proof of concept with metaclasses does not have that problem. - Our
classpropertyhassetteranddeletermethods that are supposed to raise errors: https://github.com/astropy/astropy/blob/ab1ff0cac6ed522cd9f4493a57224494df42753e/astropy/utils/decorators.py#L773-L783 In reality thesetteranddeleterare simply ignored:
Python 3.12.3 (main, Feb 4 2025, 14:48:35) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from astropy.units.format import FITS
>>> FITS._units = 0
>>> FITS._units
0
and then in a new session:
Python 3.12.3 (main, Feb 4 2025, 14:48:35) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from astropy.units.format import FITS
>>> del FITS._units
>>> FITS._units
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'FITS' has no attribute '_units'
With my proof of concept these two (closely related) bugs do not occur:
Python 3.12.3 (main, Feb 4 2025, 14:48:35) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from astropy.units.format import FITS
>>> FITS._units = 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: property '_units' of '_FITSMeta' object has no setter
>>> del FITS._units
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: property '_units' of '_FITSMeta' object has no deleter
>>> len(FITS._units)
765
If you manage to come up with an implementation of class properties that solves these problems then we should indeed consider adopting it in astropy. It might even be the case that the standard library would be interested in having such a classproperty. But to me the aforementioned problems look serious enough and the workaround with metaclasses simple enough that I don't intend to spend any effort on implementing a classproperty decorator myself.
I appreciate the thought put in here to make this work, but to be honest I'm still confused why we are trying to improve this piece of code, given that, as I noted above,
More prosaically, is it even a problem if mypy cannot compile this? It is not as if this piece is at all important for performance: as long as mypy knows it gives a dict, I don't really see the problem.
Specifically, isn't it a much larger problem that, e.g., Generic._parser is a classproperty too?
Anyway, if this is really important somehow, I much prefer your original idea of just moving to a classmethod, though given that that means any user format classes no longer work, I think that it might then be better to just do a greater refactor...
I appreciate the thought put in here to make this work, but to be honest I'm still confused why we are trying to improve this piece of code, given that, as I noted above,
More prosaically, is it even a problem if mypy cannot compile this? It is not as if this piece is at all important for performance...
Mypyc works best if the entire code is compiled as a single compilation unit, i.e. everything is compiled in one go. Admittedly, we are not compiling anything yet, but making astropy compilable is nonetheless a good goal to have because it will mean that Mypy can verify our code to be type safe and that in turn will ensure that our annotations match actual code behavior, which good both for documentation and IDE autocompletion.
Independently from the above, I don't think astropy should be implementing any general utilities that could easily be replaced with standard library utilities. Our classproperty can be replaced with standard library classmethod (if all cases of accessing the property are replaced with calling the replacement method) or with a normal property on a metaclass (which is a drop-in replacement).
as long as mypy knows it gives a dict, I don't really see the problem.
I already mentioned in the opening post that Mypy reports two errors for each use of the @classproperty decorator. This is a problem because Mypy is finding real bugs, but those bugs drown in the noise classproperty causes. Furthermore, classproperty being completely opaque to Mypy means that Mypy infers the type of every class property to be Any effectively turning off type checking on large parts of annotated code.
Specifically, isn't it a much larger problem that, e.g.,
Generic._parseris aclasspropertytoo?
My goal is to remove all uses of classproperty from the unit formatters, but I am only submitting small proofs of concept until we agree how exactly that should be done.
Anyway, if this is really important somehow, I much prefer your original idea of just moving to a
classmethod, though given that that means any user format classes no longer work, I think that it might then be better to just do a greater refactor...
_units is private, so even if changes in this pull request break downstream code then that will be downstream's problem. But I think you are also overestimating how widely used _units is. Our own Generic does not define it.
I'm all for replacing class properties with class methods! I must admit to being a bit sour on metaclasses; they end up causing such headaches re inheritance. Mypyc even has special speedups for property and classmethod
My worry was mostly about people subclassing one of our format classes and thus implicitly using private methods, but you're right, our Base class does not use it, and that's the one people are most likely to subclass.
Anyway, my main gripe remains that we are changing code because of some other package, for benefits that are not all that obvious. Surely, it is possible to tell mypy to just take this annotation at face value and ignore classproperty in its error reporting? [1] That way we can simply wait until descriptors are more properly dealt with... It is not like we are going to do away with them altogether -- our coordinate classes in particular really rely on them (and have workarounds for the docstring issues mentioned above; #15259). It also does not help knowing that numpy has given up on supporting mypy until the mypy code is improved...
[1] If it is not, then why not fix mypy -- let those maintainers worry about it.
I don't think Mypy is to blame here at all. The fundamental problem with our unit formatters is that formatting is done by formatter classes, not by formatter instances. This is not just a question of style preference. If the code used formatter instances then a multithreaded program could create a separate instance for each thread and each instance could have it's own _parser, which would be threadsafe by default (admittedly, I have not checked if ply is threadsafe itself, here I am assuming it is). But because the code uses formatter classes then there is one global formatter (per format) with a single global _parser, which has forced us to implement ThreadSafeParser or else we get data races. So Mypy is fully justified in not liking this code because the code truly is badly designed. Unfortunately we cannot address the fundamental problem because that would break the formatters downstream developers have implemented. Then the question becomes do we give up entirely or do we try to salvage what we can.
@eerovaher - yes, I agree that the formatter classes was a bad design. And I guess one good point that has come out of this that it has shown that clearly.
I'm actually OK with changing it so we use instances (presumably with a __new__ that ensures those are singletons for a given python instance/thread). It should be possible to deprecate the classes with a grace period, maybe starting 8.0.
There is a conflict and some unresolved conversation. Should we close without merge?
It looks like @mhvk is not interested in small improvements if the API is fundamentally bad, but I am unwilling to break the API, so we default to doing nothing.
Hi humans :wave: - this pull request hasn't had any new commits for approximately 4 months. I plan to close this in 30 days if the pull request doesn't have any new commits by then.
In lieu of a stalled pull request, please consider closing this and open an issue instead if a reminder is needed to revisit in the future. Maintainers may also choose to add keep-open label to keep this PR open but it is discouraged unless absolutely necessary.
If this PR still needs to be reviewed, as an author, you can rebase it to reset the clock.
If you believe I commented on this pull request incorrectly, please report this here.
Ping @mhvk
I'm going to close this pull request as per my previous message. If you think what is being added/fixed here is still important, please remember to open an issue to keep track of it. Thanks!
If this is the first time I am commenting on this issue, or if you believe I closed this issue incorrectly, please report this here.