stac-fastapi icon indicating copy to clipboard operation
stac-fastapi copied to clipboard

User transforms on item documents

Open Kirill888 opened this issue 2 years ago • 6 comments

I would like to inject custom code to transform item documents in some way as they travel from database to the user. I feel like this would be a useful feature to have regardless of the db backend.

My specific use-case is to adjust asset hrefs from relative paths to absolute based on the original self link recorded in the database. I would like to keep storing relative links for assets in the database, but since item self link changes when accessed over api those hrefs become invalid. Recording those in absolute form would solve this specific problem, but makes data relocation more involved.

Alternatively an option to "make asset href absolute using original self link" can be useful and avoids the need to write custom code.

in pgstac this can inside this function for example:

https://github.com/stac-utils/stac-fastapi/blob/162a1a2c324b4c2bfe3451f7ae19d7840a0e0452/stac_fastapi/pgstac/stac_fastapi/pgstac/core.py#L187-L191

Kirill888 avatar May 18 '22 02:05 Kirill888

Another use case for this feature would be to convert s3:// asset hrefs to s3 signed https:// urls dynamically for users with particular authorizations, or to dynamically inject additional s3 signed url assets conforming to the alternate assets STAC Extension.

CloudNiner avatar Jun 02 '22 15:06 CloudNiner

For my use case, I ended up just overriding the CoreCrudClient.get_item() method since we only wanted to add the extra signed urls on the Get Item endpoint. I don't think this solution would scale all that well as it would need to be added to each individual method and could be conflated with potentially unrelated logic.

Another solution for these types of problems could be a FastAPI middleware which didn't work quite as well for my use case in that it was difficult to extract which route was being operated on for any given invocation of the middleware function. I had the same problem when I dropped to Starlette's ASGI middleware introspecting the provided Scope.

CloudNiner avatar Jun 03 '22 13:06 CloudNiner

My solution was to monkey-patch stac_fastapi.pgstac.core.Item, it's very little code, but there is no way to detect if item was patched up already, so user_hook might be called on an already patched item, not a problem in my case though.

def install_item_hook(user_hook):
    """Patch pgstac to feed data through user_hook."""
    # pylint: disable=import-outside-toplevel
    import stac_fastapi.pgstac.core
    from stac_fastapi.types.stac import Item

    def _item_hook(*args, **kwargs):
        return user_hook(Item(*args, **kwargs))

    stac_fastapi.pgstac.core.Item = _item_hook

and here the hook I needed:

def make_asset_links_absolute(item):
    """Patch assets[*].href to be absolute links."""
    # note this can be called on a patched item also
    self_link = None
    for link in item["links"]:
        if link["rel"] == "self":
            self_link = link["href"]
            break
    if self_link is None:
        return item

    # assumes self link points to json
    prefix = "/".join(self_link.split("/")[:-1])
    for asset in item["assets"].values():
        href = asset["href"]
        if ":" not in href:
            asset["href"] = f"{prefix}/{href}"

    return item

install_item_hook(make_asset_links_absolute)

Kirill888 avatar Jun 03 '22 13:06 Kirill888

As you mention injecting custom transforms like this at the API level is difficult to do reliably without writing lots of custom code for each endpoint. I agree that the best approach for this use case is to subclass the appropriate backend and override methods accordingly.

Adjusting asset hrefs and links at the API level I think is more feasible, and related to #191

geospatial-jeff avatar Aug 04 '22 14:08 geospatial-jeff

I agree that the best approach for this use case is to subclass the appropriate backend and override methods accordingly.

@geospatial-jeff I should have probably gone with that approach, looks like it's not a huge surface area to cover, it's just a bit tricky to find exact information on the backend interface from docs alone. Now that I'm more familiar with the internals of this code-base I would approach this differently.

Kirill888 avatar Aug 05 '22 02:08 Kirill888

This is how it is done in Microsoft Planetary Computer: https://github.com/microsoft/planetary-computer-apis/blob/b0471ea9f5e84268294b48bc22432dba93907331/pcstac/pcstac/client.py#L217

drnextgis avatar Oct 14 '23 07:10 drnextgis