astropy-APEs icon indicating copy to clipboard operation
astropy-APEs copied to clipboard

APE26: Removing data storage (representations) from coordinate frames

Open jeffjennings opened this issue 1 year ago • 27 comments

Coordinate frames in astropy.coordinates currently store metadata used to construct the frame (i.e. for transforming between frames) and may also store coordinate data itself. This duplicates functionality with SkyCoord, which acts as a container for both coordinate data and reference frame information. We propose to change the frame classes such that they only store metadata and never coordinate data. This would make the implementation more modular, remove ambiguity for users from having nearly duplicate functionality with slightly different APIs, and better satisfy the principle of Separation of Concerns.

Authors of this APE (alphabetical): Jeff Jennings @jeffjennings, Adrian Price-Whelan @adrn, Nathaniel Starkman @nstarman, Marten van Kerkwijk @mhvk

jeffjennings avatar Nov 04 '24 17:11 jeffjennings

Pinging @eerovaher @eteq @StuartLittlefair @Cadair @ayshih @taldcroft as other "stakeholders" who may want to take a look at this and review!

adrn avatar Nov 04 '24 18:11 adrn

Yes, yes and a thousand times yes.

The devil will be in the implementation details but this is a thoroughly good idea.

StuartLittlefair avatar Nov 04 '24 20:11 StuartLittlefair

Does anyone else have thoughts before we ask the @astropy/coordinators to review this? cc @eteq @Cadair @ayshih @taldcroft!

adrn avatar Nov 26 '24 19:11 adrn

I have been traveling the last several weeks and haven't had a chance to review this in detail, but I will try to do so this week. I am fairly sure I will have Opinions :tm: (but hopefully positive ones?)

eteq avatar Dec 02 '24 20:12 eteq

@eteq OK cool. It'd be good to get your feedback ASAP, because @jeffjennings is starting to work through some of the changes we discuss here (in draft stages, but still, it is somewhat in progress).

adrn avatar Dec 05 '24 15:12 adrn

I have two main thoughts about some effects from this change, I think they can both be managed, but probably worth having a think about.

  1. The asdf serialisation is going to substantially different from now, managing that transition could be complex. - @braingram
  2. Many SunPy frames have an observer attribute which is a realised frame holding the position of the observer. This could obviously become a SkyCoord, but I worry that adds some more complexity, and will for a start make the repr even more obnoxious than it is currently.

Cadair avatar Dec 19 '24 10:12 Cadair

1. The asdf serialisation is going to substantially different from now, managing that transition could be complex. - @braingram

Thanks for pinging me about this change. If I'm following the end result will be that coordinate frames will no longer have data.

Reading old files with data when coordinate frames no longer support data

At that point when a user opens an ASDF file with a serialized coordinate frame (which might contain data) the converter that handles the deserialization could see that the file contains data and generate a SkyCoord instead (possibly with a warning that the file should be updated).

Writing files when coordinate frames no longer support data

This should be fine. Technically we could keep the same schema and just never write data (it's not required) but it might be a good opportunity to include some other schema cleanup (at the expense of some schema churn since many schemas reference the base coordinate frame schema).

Implementation details

The details about how this is implemented may impact ASDF serialization. Specifically the first item under implementation:

  1. Splitting the frame classes into two hierarchies: ones with and without data.

The converter(s) would need to be updated to support these new classes. This isn't a problem but is work that wouldn't be needed if instead the existing classes were updated (deprecating and then removing the data feature).

  1. Deprecating the legacy with-data frame classes.

At this point asdf-astropy could be updated to not use the legacy classes.

braingram avatar Dec 19 '24 13:12 braingram

Hi @ayshih and @Cadair, we're starting to finalize this APE and are wondering if you anticipate any major concerns with respect to SunPy (as well as any suggestions you have, e.g. for addressing Stuart's point 2 above). The first phase of the APE wouldn't introduce any breaking changes, the second phase (timeline flexible) would add deprecation warnings, and the final phase (several major astropy versions away) would require changes to SunPy. Thanks!

jeffjennings avatar Feb 07 '25 16:02 jeffjennings

Thanks! Any thoughts on performance impact?

pllim avatar Feb 27 '25 17:02 pllim

There should be no performance impact on SkyCoord. Small performance improvement on byte representations e.g. file schemas related to the new frame classes since they don't carry data. Memory improvements for the new Coordinate class since it doesn't do cacheing.

nstarman avatar Feb 27 '25 19:02 nstarman

Sorry if I missed it, but I'm still confused about what I see as the two biggest user-facing change here (i.e. the first bullet point in my review):

  1. that when you transform a SkyCoord with frame attributes to a different frame, it behaves differently than transforming a low-level class, and both are useful in some contexts. (can expand more on this if needed)
  2. that the initializing a low-level class with data is faster than a SkyCoord, by design, because there's less parsing.

Those were the main reasons for having low-level frame objects separate from SkyCoord, and both required low-level classes to have data. In this proposal, how does one do that?

eteq avatar Mar 26 '25 18:03 eteq

...when you transform a SkyCoord with frame attributes to a different frame, it behaves differently than transforming a low-level class, and both are useful in some contexts. (can expand more on this if needed)

If SkyCoord had a method that would remove all the stored frame attributes expect the ones needed for its current frame then that should allow fully replicating the behavior of the current frame instances.

eerovaher avatar Mar 26 '25 18:03 eerovaher

@eteq It looks like we should add more verbiage related to Coordinate, but if you scroll down to lines 208-210, there's a new Coordinate class which addresses both your points. It's the concrete class of BaseCoordinate, like SkyCoord, without any of the bells and whistles like cacheing or input parsing. It's the new form of the low-level class, but with the separation between reference frame and coordinate data.

nstarman avatar Mar 27 '25 01:03 nstarman

Sorry for taking so long to chime in! I'm no stranger to the subtleties of frames with data versus SkyCoord (e.g., astropy/astropy#10475), so as a developer I'm definitely sympathetic to the motivation behind this APE. However, I'm skeptical that there are sufficient user-facing benefits for this change to be worth the new development effort. (For example, I'd be enthusiastic if this change were in tandem with one of the major enhancements I suggested at last year's coordination meeting.)

To be clear, writing the new code in astropy core would probably not itself be a huge development effort, although I certainly would not look forward to verification and documentation. With my SunPy hat on, I don't anticipate any technical showstoppers for continuing to extend the astropy coordinates framework, but I foresee significant headaches to supporting multiple astropy releases across the transitions/deprecations. The headaches could then be exponentiated for those packages that extend off of sunpy.

I also share the concern raised by @eteq just above. There's the possibility having to support the various use cases with this new class organization might actually end up simply pushing complexity elsewhere in the code (e.g., having to frequently call the attribute-stripping method suggested by @eerovaher).

Just checking: Is the plan to continue to have SkyCoord be able to store frame attributes that are not used for the current frame? The APE reads as if SkyCoord stores only frame and data.

ayshih avatar Mar 27 '25 20:03 ayshih

Just checking: Is the plan to continue to have SkyCoord be able to store frame attributes that are not used for the current frame? The APE reads as if SkyCoord stores only frame and data.

The plan means SkyCoord will continue to look effectively the same from a user perspective, so, yes, it would still carry attributes not associated with the frame. As @nstarman noted, however, the new Coordinate class, however, would not - it would just contain the frame with its attributes and the representation.

Aside: I was at the last meeting, but fear I only vaguely recall the enhancements you mentioned suggesting. For my benefit, as well as of those who didn't attend, is there a few-line summary? It might be relevant mostly if the scheme proposed here helps or hinders those suggestions...

mhvk avatar Mar 27 '25 21:03 mhvk

(For example, I'd be enthusiastic if this change were in tandem with one of the major enhancements I suggested at last year's coordination meeting.)

I'm also interested to hear what those were. I was not at the meeting.

However, I'm skeptical that there are sufficient user-facing benefits for this change to be worth the new development effort.

I think this APE opens up an arena for a wide-ranging set of user-facing benefits. Ones possible when we simultaneously simplify, make more robust, and speed up the code! For example:

  1. Frames are much simpler objects. It'll be much easier to define your own. Even dynamically. See this file for how an analogous implementation of this APE in a jax-compatible library makes it trivial to extend the framework.
  2. Frames can have defined memory footprints since they no longer store data. This is faster, more memory efficient, and more compatible with schematization.
  3. Since frames don't store data it's easier to attach them to custom objects and have that object duck-type BaseCoordinate. Now users can define their own coordinate-like objects that do exactly what they need. I don't think it's in scope for this APE, but it would be great in followups to ensure that a BaseCoordinate duck type can be used in most places in Astropy. That's a huge opening up of the field for users. A major user-facing benefit. Only possible with this APE.
  4. [Think of more things here if requested]

With my SunPy hat on, I don't anticipate any technical showstoppers for continuing to extend the astropy coordinates framework, but I foresee significant headaches to supporting multiple astropy releases across the transitions/deprecations

👍. But given the outlined plan I'm having trouble anticipating the same set of headaches. Step 1 should see 0 API changes for people using ICRS, etc., or SkyCoord. The only new thing is ICRSFrame. But if a function needs data then it's trivial to put a check

def function(coord: Any):
    if not isinstance(coord, BaseCoordinate):  # BaseCoordinate  covers all of SkyCoord, Coordinate, ICRS, etc
        raise ValueError("oops. this doesn't have data. did you pass one of the new BaseFrame objects?")

Step 2 has no effect on user API.

Step 3a adds a deprecation warning, which users will see whenever they define an old-style ICRS object. Not sure how that would affect function. Step 3b swaps for SkyCoord, so there's no deprecation warning and function should continue to work without change step 3c no more old-style ICRS object. No change to function.

The largest change for downstream libraries is redefinition, like in this APE, of custom coordinate classes. But it should just be a mixin, like in this APE. No functions consuming coordinate objects need to change. And users won't need to change anything until the deprecation warnings happen, in which case the migration is a simple

obj = CustomCoordinateFrameClass(data, **params)

to

obj = (SkyCoord or Coordinate)(data, frame= CustomCoordinateFrameClass(**params))

where the choice of Coordinate would be closes to the old code, but SkyCoord should generally work as well.

I also share the concern raised by @eteq just above. There's the possibility having to support the various use cases with this new class organization might actually end up simply pushing complexity elsewhere in the code (e.g., having to frequently call the attribute-stripping method suggested by @eerovaher).

See https://github.com/astropy/astropy-APEs/pull/112#issuecomment-2756186671. SkyCoord isn't changing. no-cache and simple-parse coordinates will continue to exist with Coordinate.

It sounds like there should be some verbiage added to explain what is shown in the Implementation code examples! Suggestions appreciated!

nstarman avatar Mar 28 '25 17:03 nstarman

Aside: I was at the last meeting, but fear I only vaguely recall the enhancements you mentioned suggesting. For my benefit, as well as of those who didn't attend, is there a few-line summary? It might be relevant mostly if the scheme proposed here helps or hinders those suggestions...

They certainly aren't fleshed-out ideas yet, they would require major changes:

  • Separately define true/actual coordinates from apparent coordinates -> add support for object propagation in general
  • Decouple origin location and axes orientation a la SPICE -> add support for transforming vector fields

The new scheme proposed here shouldn't hinder any of the above, and it's quite possible that it'd be helpful, if for no other reason than that the internals would be less duplicated/tangled. My point is that it sure would be nice if we were aiming for something grand at the same time as making these changes. For example, if people were on board with the second bullet, instead of merely having Coordinate be data plus a data-less frame, we would go ahead and make it data plus frame origin plus frame orientation to avoid a future deprecation cycle.

ayshih avatar Apr 02 '25 20:04 ayshih

  1. Frames are much simpler objects. It'll be much easier to define your own. Even dynamically.

Hmm, I'm not sure I follow. Subclassing BaseCoordinateFrame is already simple. The heavy lifting is in defining the transformations between frames, and you'll still have to do that regardless of the scheme.

  1. Frames can have defined memory footprints since they no longer store data. This is faster, more memory efficient, and more compatible with schematization.

True, but I don't feel like those benefits would ever be appreciable to users. I think the slog in virtually all use cases is tied to working with coordinates, not with data-less frames.

  1. Since frames don't store data it's easier to attach them to custom objects and have that object duck-type BaseCoordinate.

Yup, I agree, but as I said in my previous post, I wish we were building toward something now instead of a hypothetical future.

👍. But given the outlined plan I'm having trouble anticipating the same set of headaches.

You may be right that it'll not be as painful as I fear. I'll mull it over more.

ayshih avatar Apr 02 '25 20:04 ayshih

With my SunPy hat on, I don't anticipate any technical showstoppers for continuing to extend the astropy coordinates framework

I just realized that there is an approach that we use in SunPy that I'm not immediately sure how we'd adapt to this new scheme. Some of our frames define frame-specific methods/properties that take advantage of both (1) having access to the coordinate data and (2) being exposed as methods/properties callable on the SkyCoord. As I understand the new scheme, that will no longer be possible?

ayshih avatar Apr 02 '25 20:04 ayshih

@ayshih - :+1: to your general point of trying to think a bit bigger at the same time!

  • Decouple origin location and axes orientation a la SPICE -> add support for transforming vector fields

That would be interesting. It might also make something like SkyOffsetFrame a little easier. Would there ever be a reason not to have both, though? I.e., if we had origin and orientation separately, would it not be fine for Frame to just be a container of those two? (Sorry, too little thought here!)

mhvk avatar Apr 02 '25 21:04 mhvk

Subclassing BaseCoordinateFrame is already simple. The heavy lifting is in defining the transformations between frames, and you'll still have to do that regardless of the scheme.

How would I dynamically define a time-dependent cluster-centric reference frame? The transformations are mathematical "heavy lifting" but from a code perspective are easy to work with dynamically since Astropy's registration system is pretty good. Subclassing BaseCoordinateFrame is not simple from a dynamical perspective. Gotta mess around with type and custom init methods. The proposed dataclass structure is simple to subclass.

Yup, I agree, but as I said in my previous post, I wish we were building toward something now instead of a hypothetical future.

Not hypothetical:

  • https://github.com/GalacticDynamics/coordinax
  • https://github.com/GalOrrery/interpolated-coordinates

The only reason there aren't more ducktypes is that we don't support them. People don't use what they can't make. Build it and they shall come. We work hard to make QTable mixins for so many objects, and now we're working hard to expand Quantity (https://github.com/astropy/quantity-2.0, https://github.com/quantity-dev/metrology-apis). And it's because duck typing means users can use our powerful systems but customized to their needs in ways we haven't anticipated.

Decouple origin location and axes orientation a la SPICE -> add support for transforming vector fields

👍. Great idea. Not sure it fits into this APE, which actually has a remarkably tight focus. We are not changing the frame mechanics, just separating out the data storage. Very little should break across Astropy in any kind of catastrophic way. Fundamentally changing the framework as this suggests is a much more challenging change. Sounds quite worthwhile, but I don't think tackling it in separate APEs actually makes doing it harder since the same deprecations will be involved with SkyCoord and the old-style coordinate classes as well.

nstarman avatar Apr 03 '25 03:04 nstarman

Just to add a comment to the discussion between @ayshih and @nstarman. I think we should not underestimate the benefits to improving the simplicity of the coordinates code base. Personally I have always tried to avoid work which involves the internals of SkyCoord because the existing code base is so complex. If the proposed APE improves this, and I believe it will, then I believe it offers us significant advantages in terms of maintainability, the ability to add new features, and the ability to encourage others to contribute.

As an unrelated comment, there is an upcoming funding opportunity in the UK to provide support to existing scientific software packages, so I may be able to provide meaningful support to implementation of this APE, if approved.

StuartLittlefair avatar Apr 03 '25 07:04 StuartLittlefair

Thank you all for your comments. At this point we will leave this PR open and complete a prototype implementation, and return when that is ready and can be demoed. In the interim, we will not address further comments left here.

Signed, APE 26 authors (@adrn, @jeffjennings, @mhvk, @nstarman)

jeffjennings avatar May 07 '25 15:05 jeffjennings

and return when that is ready and can be demoed

In the light of this development, CoCo will hold off on further reviews for now. Thank you.

pllim avatar May 09 '25 14:05 pllim

Possible updates from Astropy Coordination Meeting 2025:

  • ICRS(ra, dec) returns a frame now; proposal is to keep it around but returning a SkyCoord in the future (on the same deprecation timescale) cc @keflavich
  • Clarify more explicitly that your planned deprecation is at least 2 years, not just 1 year.

pllim avatar Jul 08 '25 03:07 pllim

Possible updates from Astropy Coordination Meeting 2025:

* `ICRS(ra, dec)` returns a frame now; proposal is to keep it around but returning a `SkyCoord` in the future (on the same deprecation timescale) cc @keflavich

Our proposed deprecation stages for ICRS and all other frames are:

  • keep current API, but internally call the new data-less equivalent class (e.g., ICRSFrame for ICRS), while emitting a deprecation warning when instantiated.
  • still warn, but return a Coordinate, not an instance of its class type (by overriding __new__)
  • remove the legacy frame classes after a suitable deprecation period

Keeping ICRS would unfortunately generate much user confusion during and after the full deprecation cycle, as there would now be ICRS and ICRSFrame. The docstrings and docs would have to explain the difference, and even then there would be confusion. An extensive deprecation cycle for these changes (comment quoted immediately below) has already been proposed, and a search-replace for ICRS to SkyCoord in downstream code would not be prohibitive.

* Clarify more explicitly that your planned deprecation is at least 2 years, not just 1 year.

jeffjennings avatar Oct 30 '25 17:10 jeffjennings

Hi all, we now have a prototype and demo available for review. Please provide any comments within 2 weeks, by November 14.

We recommend reading through the demo notebook and then taking a look at the prototype implementation as needed (note that the implementation is not exhaustive, e.g., SkyCoord is not fully populated - there's just enough to demonstrate the new framework).

Signed, APE 26 authors (@adrn, @jeffjennings, @mhvk, @nstarman)

jeffjennings avatar Oct 30 '25 17:10 jeffjennings