pystac icon indicating copy to clipboard operation
pystac copied to clipboard

Collection class should not inherit from Catalog class

Open philvarner opened this issue 4 years ago • 13 comments

This is a confusing part of the spec (which has changed recently) -- there is no longer a subtying relationship between the types Collection and Catalog. They're two independent types that just happen to share some fields of the same name.

pystac shouldn't conflate them in the implementation.

philvarner avatar Mar 29 '21 16:03 philvarner

Related: These tests https://github.com/stac-utils/pystac/issues/292

philvarner avatar Mar 29 '21 16:03 philvarner

We said in calls, tooling could still implement an inheritance relationship though.

m-mohr avatar Apr 23 '21 13:04 m-mohr

I'm not sure it's worth it to remove this inheritance from the implementation. Considering they can both be roots, they operate in mostly the same way, the inheritance relationship is super useful. I'm not sure it would be worth it to create some CatalogCollectionCommonBase type and have that scattered around the library to have an empty Catalog implementation. I forget what the key difference was that made a collection not able to be a catalog - it was something about the number of item/child links? If it's a minimal distinction, I'd rather handle it as a special case rather than change the natural inheritance structure of PySTAC to accommodate the breakage of inheritance in the spec.

lossyrob avatar Apr 27 '21 00:04 lossyrob

The primary reason was the type field, which has different values.

There are other minor reasons like the requirement for item/child links or different values for stac_extensions.

m-mohr avatar Apr 27 '21 08:04 m-mohr

One underlying issue is a recently-developed irrational hatred for subtyping inheritance.

But, the conceptual problem is that it violates the Liskov Substitution Principle for subtyping inheritance. A Collection can't be arbitrarily used anywhere that a Catalog can be used -- they're conceptually unrelated objects that just happen to have overlap in their field names.

I'd be glad to make the change, but since I haven't contributed to this component before, I wanted to discuss before doing the work (which should be minimal).

philvarner avatar Apr 27 '21 14:04 philvarner

One underlying issue is a recently-developed irrational hatred for subtyping inheritance.

Who developed that?

But, the conceptual problem is that it violates the Liskov Substitution Principle for subtyping inheritance. A Collection can't be arbitrarily used anywhere that a Catalog can be used -- they're conceptually unrelated objects that just happen to have overlap in their field names.

I think I agree, but which are the places where a Catalog can't be substituted with a Collection?

m-mohr avatar Apr 27 '21 14:04 m-mohr

One underlying issue is a recently-developed irrational hatred for subtyping inheritance.

Who developed that?

Me! Sorry, I should have been more clear that I was talking about my own irrational hatred for it.

philvarner avatar Apr 27 '21 15:04 philvarner

I'm not sure there's enough to motivate a change like this here, but if you want to take a crack at it to prove it's a minimal change and show that there's a way to modify it in a non-disruptive way (which for someone who knows how much functionality Catalogs and Collections share, I am doubtful), I'd suggest proposing something against the #309 branch, as it has full type annotations and mypy. The tests aren't yet passing on it yet, that should be solved over this week, but it would be better to tinker around with the typed codebase with something like Pylance to check (which does check for breakage of the Liskov Substitution Principle) than the current main.

lossyrob avatar Apr 27 '21 15:04 lossyrob

Also, just to note:

they're conceptually unrelated objects that just happen to have overlap in their field names.

I can see where this opinion comes from, but I also do not agree with the strength of the stance that's implied here. Conceptually, they are related - that's why a Collection was a specific implementation of a Catalog with additional functionality to begin with. I know his has been argued ad nauseum in something that resembles the battle for functional folk to eradicate OO concepts from the world or whatever (I regret the days I was even resembling being on the side of that bike shedding war), but there's not many concepts that fit the description "things that can be the root of a STAC, and can hold child Collections or Catalogs as well as Items", and the two things that do fit that description are clearly related. From a practical standpoint, it is useful to have the inheritance. If there's some practical reason to separate the inheritance, that isn't rooted in ideology or coding tastes, I'm happy to weigh the pros vs cons of removing what is a useful programmatic concept to introduce more useful programmatic concepts.

lossyrob avatar Apr 27 '21 15:04 lossyrob

Ah, great, I'll do that. I've been wanting to explore more about the static typechecking options in Python.

On Tue, Apr 27, 2021 at 11:25 AM Rob Emanuele @.***> wrote:

I'm not sure there's enough to motivate a change like this here, but if you want to take a crack at it to prove it's a minimal change and show that there's a way to modify it in a non-disruptive way (which for someone who knows how much functionality Catalogs and Collections share, I am doubtful), I'd suggest proposing something against the #309 https://github.com/stac-utils/pystac/pull/309 branch, as it has full type annotations and mypy. The tests aren't yet passing on it yet, that should be solved over this week, but it would be better to tinker around with the typed codebase with something like Pylance to check (which does check for breakage of the Liskov Substitution Principle) than the current main.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stac-utils/pystac/issues/296#issuecomment-827697302, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA6QRGSKAMVHZTGNWIWIF3TK3JPHANCNFSM4Z737N7Q .

philvarner avatar Apr 27 '21 15:04 philvarner

I know his has been argued ad nauseum in something that resembles the battle for functional folk to eradicate OO concepts from the world or whatever

Hehe, I'm not sure that goes towards me being radical about inheritance in the discussion (and I hate that there's no inheritance any longer), but I'm actually strictly on the OO side and not on the functional side. ;-) Maybe to give some outside of Python point of view, all the JS implementations will also still implement inheritance for practical reasons.

I'm wondering, does Pylance fail in case the Collection class returns "Collection" instead of "Catalog" although it inherits from Catalog and Catalogs are always expected to return "Catalog" as type?

m-mohr avatar Apr 27 '21 15:04 m-mohr

Also, just to note:

they're conceptually unrelated objects that just happen to have overlap in their field names.

I can see where this opinion comes from, but I also do not agree with the strength of the stance that's implied here. Conceptually, they are related - that's why a Collection was a specific implementation of a Catalog with additional functionality to begin with. I know his has been argued ad nauseum in something that resembles the battle for functional folk to eradicate OO concepts from the world or whatever (I regret the days I was even resembling being on the side of that bike shedding war), but there's not many concepts that fit the description "things that can be the root of a STAC, and can hold child Collections or Catalogs as well as Items", and the two things that do fit that description are clearly related. From a practical standpoint, it is useful to have the inheritance. If there's some practical reason to separate the inheritance, that isn't rooted in ideology or coding tastes, I'm happy to weigh the pros vs cons of removing what is a useful programmatic concept to introduce more useful programmatic concepts.

Mine mostly comes from having been involved with several codebases that became overly-complicated and difficult to maintain because the inheritance hierarchy, for what amounted to basically just saving a few lines of duplicate code. Maybe this is the best way to structure it in a language like Python that doesn't have a typeclass mechanism? I'm still trying to work out a lot of these design principles myself, having done Java for a long time and then more recently trying to incorporate the Python & Rust & Scala & Haskell concepts in.

philvarner avatar Apr 27 '21 15:04 philvarner

Makes sense. Agree that inheritance isn't always (or often) the best solution, but does have it's place, especially in Python development. For sure it's always good to be trying things out and transferring concepts around where they're useful. Since leaning on the Pylance type checking functionality with full type annotations I'm enjoying some of the type-driven development I miss from Scala, though trying to keep Python code still looking like Python and not leaning too Haskell-y :-)

lossyrob avatar Apr 27 '21 16:04 lossyrob