pystac icon indicating copy to clipboard operation
pystac copied to clipboard

normalise_hrefs causing unwanted Item asset href changes.

Open MitchellPaff opened this issue 2 years ago • 6 comments

Hello. I am seeing some weird behaviour that I hope you can help me with.

I've made an example in this repository here. In this example I am attempting to add a STAC catalog 'catalog_1' as a child to the root catalog.json.

If you clone that repo and run main.py, you will see the files are modified and the new child and parent relationship is added to the links object of each STAC file successfully.

However if you look at line 27 of the modified Item STAC file '/example/catalog_1/collection_1/item_1/item_1.json'. You will see that the href to the asset has been updated and now points to a non-existent asset outside of this directory.

Any help on this issue would be greatly appreciated. Thanks.

MitchellPaff avatar May 02 '22 10:05 MitchellPaff

Thanks for the report and the excellent reproducible example @MitchellPaff ! I was able to reproduce this using your example and I'll try to have a fix in the 1.5.0 release that we're putting out soon.

duckontheweb avatar May 12 '22 10:05 duckontheweb

So, in this case the root cause is an invalid self link in example/catalog_1/collection_1/item_1/item_1.json. The self link in that file is:

{
  "rel": "self",
  "href": "feature.json",
  "type": "application/json"
}

Per the STAC spec docs on Relation Types self links must be an absolute URL to the Item. When PySTAC resolves the link to that Item from the parent collection, it calls Item.set_self_href on the Item. As part of this, it loops over all of the Item's Assets and updates the href property to be relative to the Item's new self HREF, if it is not already absolute. The first thing it does is get an absolute HREF for the Asset based on the Item's self HREF, which is where the problem is coming in here.

The Item has a self HREF of feature.json and PySTAC interprets this as an absolute HREF (e.g. /feature.json on my Mac). PySTAC then constructs an absolute HREF for the Asset based on the existing relative HREF (in this case asset_1). This absolute HREF ends up being something like /asset_1. It then gets the relative HREF to the asset from the Item's new self HREF that it got from the item link here. On my system, that path is something like /Users/my_user/Code/stac_example/example/catalog_1/collection_1/item_1/item1.json, so the relative path from that Item to what PySTAC thinks is the Asset HREF is something like ../../../../../../../../asset_1.

To be honest, I'll need to think a bit more about what the right course of action is. On the one hand, PySTAC is correctly handling that Item's self HREF according to the spec. On the other hand, this kind of issue is really hard to debug and it would be nice if we at least had some mechanism for reporting when the self HREF is clearly incorrect. I'm just not sure how to detect that reliably...

duckontheweb avatar May 13 '22 21:05 duckontheweb

@gadomski @lossyrob Do you have any opinions on what the best course of action would be here?

duckontheweb avatar Jun 08 '22 10:06 duckontheweb

PySTAC interprets this as an absolute HREF (e.g. /feature.json on my Mac)

I think this is incorrect. When creating hrefs, a relative self href should either:

  1. cause an error, or
  2. become absolute via make_absolute_path (which uses the current working directory via abspath), probably with a warning

This goes back to the question of "is PySTAC a strict implementation of the specification, or a tool to work with the specification in a user-friendly way"? If it's a strict implementation, option 1; otherwise, option 2.

Interested in @philvarner's thoughts as well.

gadomski avatar Jun 08 '22 17:06 gadomski

This goes back to the question of "is PySTAC a strict implementation of the specification, or a tool to work with the specification in a user-friendly way"? If it's a strict implementation, option 1; otherwise, option 2.

There are a few places in the code (e.g. accepting single lists as bboxes in Collections extents) that suggest to me that we are leaning towards the latter approach. I'll implement approach 2. and see if there are any unintended consequences that we need to work out.

duckontheweb avatar Jun 24 '22 13:06 duckontheweb

I think I'll need a bit more time to put together a solution and some tests to make sure we're actually getting the behavior we want out of this. I'm going to move this into a future release for now so we don't hold up 1.5.

duckontheweb avatar Jul 18 '22 17:07 duckontheweb

Thanks again for the excellent reproducible example! I used it to confirm that https://github.com/stac-utils/pystac/pull/984 fixes the problem.

gadomski avatar Feb 14 '23 16:02 gadomski