pystac
pystac copied to clipboard
normalise_hrefs causing unwanted Item asset href changes.
Hello. I am seeing some weird behaviour that I hope you can help me with.
I've made an example in this repository here. In this example I am attempting to add a STAC catalog 'catalog_1' as a child to the root catalog.json.
If you clone that repo and run main.py, you will see the files are modified and the new child and parent relationship is added to the links object of each STAC file successfully.
However if you look at line 27 of the modified Item STAC file '/example/catalog_1/collection_1/item_1/item_1.json'. You will see that the href to the asset has been updated and now points to a non-existent asset outside of this directory.
Any help on this issue would be greatly appreciated. Thanks.
Thanks for the report and the excellent reproducible example @MitchellPaff ! I was able to reproduce this using your example and I'll try to have a fix in the 1.5.0 release that we're putting out soon.
So, in this case the root cause is an invalid self
link in example/catalog_1/collection_1/item_1/item_1.json
. The self link in that file is:
{
"rel": "self",
"href": "feature.json",
"type": "application/json"
}
Per the STAC spec docs on Relation Types self
links must be an absolute URL to the Item. When PySTAC resolves the link to that Item from the parent collection, it calls Item.set_self_href
on the Item
. As part of this, it loops over all of the Item's Assets and updates the href
property to be relative to the Item's new self
HREF, if it is not already absolute. The first thing it does is get an absolute HREF for the Asset based on the Item's self
HREF, which is where the problem is coming in here.
The Item has a self
HREF of feature.json
and PySTAC interprets this as an absolute HREF (e.g. /feature.json
on my Mac). PySTAC then constructs an absolute HREF for the Asset based on the existing relative HREF (in this case asset_1
). This absolute HREF ends up being something like /asset_1
. It then gets the relative HREF to the asset from the Item's new self HREF that it got from the item
link here. On my system, that path is something like /Users/my_user/Code/stac_example/example/catalog_1/collection_1/item_1/item1.json
, so the relative path from that Item to what PySTAC thinks is the Asset HREF is something like ../../../../../../../../asset_1
.
To be honest, I'll need to think a bit more about what the right course of action is. On the one hand, PySTAC is correctly handling that Item's self
HREF according to the spec. On the other hand, this kind of issue is really hard to debug and it would be nice if we at least had some mechanism for reporting when the self
HREF is clearly incorrect. I'm just not sure how to detect that reliably...
@gadomski @lossyrob Do you have any opinions on what the best course of action would be here?
PySTAC interprets this as an absolute HREF (e.g. /feature.json on my Mac)
I think this is incorrect. When creating hrefs
, a relative self
href should either:
- cause an error, or
- become absolute via
make_absolute_path
(which uses the current working directory via abspath), probably with a warning
This goes back to the question of "is PySTAC a strict implementation of the specification, or a tool to work with the specification in a user-friendly way"? If it's a strict implementation, option 1; otherwise, option 2.
Interested in @philvarner's thoughts as well.
This goes back to the question of "is PySTAC a strict implementation of the specification, or a tool to work with the specification in a user-friendly way"? If it's a strict implementation, option 1; otherwise, option 2.
There are a few places in the code (e.g. accepting single lists as bboxes
in Collections extents) that suggest to me that we are leaning towards the latter approach. I'll implement approach 2. and see if there are any unintended consequences that we need to work out.
I think I'll need a bit more time to put together a solution and some tests to make sure we're actually getting the behavior we want out of this. I'm going to move this into a future release for now so we don't hold up 1.5.
Thanks again for the excellent reproducible example! I used it to confirm that https://github.com/stac-utils/pystac/pull/984 fixes the problem.