astropy icon indicating copy to clipboard operation
astropy copied to clipboard

Avoid using `classproperty` in unit formatters

Open eerovaher opened this issue 10 months ago • 20 comments

Description

Trying to check our unit formatter code with Mypy results in a large number of errors that look like

astropy/units/format/ogip.py:69: error: Missing positional argument "fget" in call to "classproperty"  [call-arg]
astropy/units/format/ogip.py:69: error: "classproperty" not callable  [operator]

This because our @classproperty is too dynamic for type checkers. Luckily the unit formatters don't really need @classproperty because very similar functionality can be achieved with standard library @classmethod and @functools.cache, which don't cause problems for type checkers. Replacing our custom utilities with standard library ones is a good thing to do independently of type checking.

I'm opening this as a draft where I have replaced just one class property. If the maintainers think this is a good idea then I can replace all of them.

  • [x] By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

eerovaher avatar Jan 21 '25 21:01 eerovaher

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • [ ] Do the proposed changes actually accomplish desired goals?
  • [ ] Do the proposed changes follow the Astropy coding guidelines?
  • [ ] Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • [ ] Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • [ ] Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • [ ] Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • [ ] Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • [ ] Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • [ ] At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

github-actions[bot] avatar Jan 21 '25 21:01 github-actions[bot]

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

github-actions[bot] avatar Jan 21 '25 21:01 github-actions[bot]

Here we do want caching.

eerovaher avatar Jan 21 '25 22:01 eerovaher

Yes. I was posting for consideration of static typing.

nstarman avatar Jan 21 '25 23:01 nstarman

The changes I am proposing are indeed clumsy, but the formatters are fundamentally badly designed and changing that would be too disruptive for downstream developers. If we want to compile the code with Mypyc then Mypy has to approve it first and such awkward hacks will be needed for that.

It would be very nice if we could implement classproperty in a type-checker friendly way, but as far as I know descriptors are too dynamic for static type checkers and the common standard library descriptors are supported only because of hardcoded exceptions in type checking code, which don't apply to user-defined descriptors. The class @nstarman mentioned contains more than one # type: ignore[type-arg] instruction, so that doesn't give me any hope that we could compile anything based on that with Mypyc.

eerovaher avatar Jan 22 '25 22:01 eerovaher

I feel this is just working around limitations of something (the typing system) that is still in flux, and I don't really see the point of doing that

I feel the opposite way. Typed-Python is intentionally restrictive and creates pressure on design patterns (which is a good thing). As such, it may never support everything that Yolo-Python (Python without type checkers) allows. I believe that constraining our designs such that they are type-checker friendly makes for more robust code. Of course refactoring on the scale of a large library like astropy is extremely challenging, and I don't know that we can realistically hope for it to happen, but I do think we should try to leverage existing static-analysis tools (mypy) now rather than waiting for them to support extremely dynamic patterns, which may never happen, realistically.

I'm generally +1 on this initiative, especially when the diff count is so low.

neutrinoceros avatar Jan 23 '25 09:01 neutrinoceros

My other worry beyond disliking changing-for-the-sake-of-mypy is that _units is one thing that someone making their own subclass (if it were at all a serious one that did something interesting) would have defined. So, I do not see how this change cannot break their work. Of course, it may well be that the previous changes would have done that too, but I think those were much less likely to have done so. And of course it is also eminently possible that no-one ever created a new format class.

More prosaically, is it even a problem if mypy cannot compile this? It is not as if this piece is at all important for performance: as long as mypy knows it gives a dict, I don't really see the problem.

mhvk avatar Jan 23 '25 14:01 mhvk

I do not see how this change cannot break their work

That is worrying indeed. Perhaps should ask on astropy communication channels, just in case. After all, we just got bitten by LSST using Lupton RGB feature that we thought was okay to remove...

pllim avatar Jan 23 '25 15:01 pllim

@eerovaher do you know what mypy thinks about a subclass of property? As in can we sidestep the issue, though incurring the drawback of having extraneous API methods like .setter that might need to raise errors.

nstarman avatar Jan 23 '25 15:01 nstarman

@mhvk wrote

... It would be much better ... to improve CPython itself (by introducing a classproperty there ...

Python 3.9 made it possible to stack @classmethod and @property, but that was deprecated in Python 3.11 and apparently it's an error in Python 3.13. So I don't think its reasonable to expect classproperty being implemented in the standard library any time soon.

I've updated this pull request with a different proof of concept. In Python classes are instances of metaclasses, so if we define for every formatter class (that uses our classproperty) a corresponding metaclass then we can replace what are currently class properties on the formatter classes with normal properties on the metaclasses. This allows us to avoid using our classproperty in the unit formatters, makes the formatters understandable by type checkers (I have verified it with Mypy) and avoids breaking private API (even though we should be allowed to break private API whenever we want or else our private API is actually public API).

eerovaher avatar Mar 03 '25 15:03 eerovaher

The class @nstarman https://github.com/astropy/astropy/pull/17652#pullrequestreview-2565818664 contains more than one # type: ignore[type-arg] instruction, so that doesn't give me any hope that we could compile anything based on that with Mypyc.

I think those type ignores are actually because of beartype<v20, which didn't like parametrized class method. This is solved in beartype>v20. I do think we can make a myyc-compatible classproperty descriptor. No custom metaclasses and associated inheritance issues required.

nstarman avatar Mar 03 '25 17:03 nstarman

The fundamental problem with class properties (both our classproperty and the classmethod and property combination that standard library briefly recommended) is that descriptors and classes do not interact like descriptors and instances do. There are two problems that I learned about when I investigated this.

  1. Calling help() on a class with class properties executes the properties. To see that you can add print() calls to FITS._units() and see that calling help(FITS) does indeed print your message. Alternatively, you can add a time.sleep() call to FITS._units() and see how long it takes for the help text to appear after calling help(FITS). Do keep in mind that FITS._units() is cached, therefore it will only be executed the first time help(FITS) is called. My proof of concept with metaclasses does not have that problem.
  2. Our classproperty has setter and deleter methods that are supposed to raise errors: https://github.com/astropy/astropy/blob/ab1ff0cac6ed522cd9f4493a57224494df42753e/astropy/utils/decorators.py#L773-L783 In reality the setter and deleter are simply ignored:
Python 3.12.3 (main, Feb  4 2025, 14:48:35) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from astropy.units.format import FITS
>>> FITS._units = 0
>>> FITS._units
0

and then in a new session:

Python 3.12.3 (main, Feb  4 2025, 14:48:35) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from astropy.units.format import FITS
>>> del FITS._units
>>> FITS._units
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'FITS' has no attribute '_units'

With my proof of concept these two (closely related) bugs do not occur:

Python 3.12.3 (main, Feb  4 2025, 14:48:35) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from astropy.units.format import FITS
>>> FITS._units = 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: property '_units' of '_FITSMeta' object has no setter
>>> del FITS._units
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: property '_units' of '_FITSMeta' object has no deleter
>>> len(FITS._units)
765

If you manage to come up with an implementation of class properties that solves these problems then we should indeed consider adopting it in astropy. It might even be the case that the standard library would be interested in having such a classproperty. But to me the aforementioned problems look serious enough and the workaround with metaclasses simple enough that I don't intend to spend any effort on implementing a classproperty decorator myself.

eerovaher avatar Mar 03 '25 19:03 eerovaher

I appreciate the thought put in here to make this work, but to be honest I'm still confused why we are trying to improve this piece of code, given that, as I noted above,

More prosaically, is it even a problem if mypy cannot compile this? It is not as if this piece is at all important for performance: as long as mypy knows it gives a dict, I don't really see the problem.

Specifically, isn't it a much larger problem that, e.g., Generic._parser is a classproperty too?

Anyway, if this is really important somehow, I much prefer your original idea of just moving to a classmethod, though given that that means any user format classes no longer work, I think that it might then be better to just do a greater refactor...

mhvk avatar Mar 03 '25 19:03 mhvk

I appreciate the thought put in here to make this work, but to be honest I'm still confused why we are trying to improve this piece of code, given that, as I noted above,

More prosaically, is it even a problem if mypy cannot compile this? It is not as if this piece is at all important for performance...

Mypyc works best if the entire code is compiled as a single compilation unit, i.e. everything is compiled in one go. Admittedly, we are not compiling anything yet, but making astropy compilable is nonetheless a good goal to have because it will mean that Mypy can verify our code to be type safe and that in turn will ensure that our annotations match actual code behavior, which good both for documentation and IDE autocompletion.

Independently from the above, I don't think astropy should be implementing any general utilities that could easily be replaced with standard library utilities. Our classproperty can be replaced with standard library classmethod (if all cases of accessing the property are replaced with calling the replacement method) or with a normal property on a metaclass (which is a drop-in replacement).

as long as mypy knows it gives a dict, I don't really see the problem.

I already mentioned in the opening post that Mypy reports two errors for each use of the @classproperty decorator. This is a problem because Mypy is finding real bugs, but those bugs drown in the noise classproperty causes. Furthermore, classproperty being completely opaque to Mypy means that Mypy infers the type of every class property to be Any effectively turning off type checking on large parts of annotated code.

Specifically, isn't it a much larger problem that, e.g., Generic._parser is a classproperty too?

My goal is to remove all uses of classproperty from the unit formatters, but I am only submitting small proofs of concept until we agree how exactly that should be done.

Anyway, if this is really important somehow, I much prefer your original idea of just moving to a classmethod, though given that that means any user format classes no longer work, I think that it might then be better to just do a greater refactor...

_units is private, so even if changes in this pull request break downstream code then that will be downstream's problem. But I think you are also overestimating how widely used _units is. Our own Generic does not define it.

eerovaher avatar Mar 04 '25 14:03 eerovaher

I'm all for replacing class properties with class methods! I must admit to being a bit sour on metaclasses; they end up causing such headaches re inheritance. Mypyc even has special speedups for property and classmethod

nstarman avatar Mar 04 '25 19:03 nstarman

My worry was mostly about people subclassing one of our format classes and thus implicitly using private methods, but you're right, our Base class does not use it, and that's the one people are most likely to subclass.

Anyway, my main gripe remains that we are changing code because of some other package, for benefits that are not all that obvious. Surely, it is possible to tell mypy to just take this annotation at face value and ignore classproperty in its error reporting? [1] That way we can simply wait until descriptors are more properly dealt with... It is not like we are going to do away with them altogether -- our coordinate classes in particular really rely on them (and have workarounds for the docstring issues mentioned above; #15259). It also does not help knowing that numpy has given up on supporting mypy until the mypy code is improved...

[1] If it is not, then why not fix mypy -- let those maintainers worry about it.

mhvk avatar Mar 05 '25 17:03 mhvk

I don't think Mypy is to blame here at all. The fundamental problem with our unit formatters is that formatting is done by formatter classes, not by formatter instances. This is not just a question of style preference. If the code used formatter instances then a multithreaded program could create a separate instance for each thread and each instance could have it's own _parser, which would be threadsafe by default (admittedly, I have not checked if ply is threadsafe itself, here I am assuming it is). But because the code uses formatter classes then there is one global formatter (per format) with a single global _parser, which has forced us to implement ThreadSafeParser or else we get data races. So Mypy is fully justified in not liking this code because the code truly is badly designed. Unfortunately we cannot address the fundamental problem because that would break the formatters downstream developers have implemented. Then the question becomes do we give up entirely or do we try to salvage what we can.

eerovaher avatar Mar 05 '25 19:03 eerovaher

@eerovaher - yes, I agree that the formatter classes was a bad design. And I guess one good point that has come out of this that it has shown that clearly.

I'm actually OK with changing it so we use instances (presumably with a __new__ that ensures those are singletons for a given python instance/thread). It should be possible to deprecate the classes with a grace period, maybe starting 8.0.

mhvk avatar Mar 05 '25 20:03 mhvk

There is a conflict and some unresolved conversation. Should we close without merge?

pllim avatar Jun 18 '25 14:06 pllim

It looks like @mhvk is not interested in small improvements if the API is fundamentally bad, but I am unwilling to break the API, so we default to doing nothing.

eerovaher avatar Jun 20 '25 19:06 eerovaher

Hi humans :wave: - this pull request hasn't had any new commits for approximately 4 months. I plan to close this in 30 days if the pull request doesn't have any new commits by then.

In lieu of a stalled pull request, please consider closing this and open an issue instead if a reminder is needed to revisit in the future. Maintainers may also choose to add keep-open label to keep this PR open but it is discouraged unless absolutely necessary.

If this PR still needs to be reviewed, as an author, you can rebase it to reset the clock.

If you believe I commented on this pull request incorrectly, please report this here.

github-actions[bot] avatar Aug 01 '25 05:08 github-actions[bot]

Ping @mhvk

nstarman avatar Aug 01 '25 13:08 nstarman

I'm going to close this pull request as per my previous message. If you think what is being added/fixed here is still important, please remember to open an issue to keep track of it. Thanks!

If this is the first time I am commenting on this issue, or if you believe I closed this issue incorrectly, please report this here.

github-actions[bot] avatar Sep 01 '25 05:09 github-actions[bot]