pgstac icon indicating copy to clipboard operation
pgstac copied to clipboard

Dynamic queryables

Open emmanuelmathot opened this issue 2 years ago • 3 comments

Is there already any stored function that would update the queryables table with the actual possible values from items or from collection summaries? We already use pgstac.missing_queryables() as described in https://stac-utils.github.io/pgstac/pgstac/#queryable-metadata but it seems to extract the definition from the extension json. Thx.

emmanuelmathot avatar Sep 05 '23 14:09 emmanuelmathot

@emmanuelmathot the missing_queryables function will load a sample of data rows (controlled by the table sample argument which is an approximate percent of rows to look at). For any property names that exist in the loaded extensions, it will get the definition from there. If the property does not exist in any extension json, it will just make a very basic definition based on the json type of the property.

bitner avatar Sep 06 '23 21:09 bitner

Yes indeed, that is what I understood by using and reviewing the function code. My question was about a potential function or stored procedure in either pgstac or pypgstac that would generate the definitions based on the actual possible values from the items. Especially for the enums. For instance, the sat:orbit_state field from the sat extension defines the following possible values : ascending, descending, geostationary. If the collection only has items with ascending and descending, this function would generate an enum with only those values.

emmanuelmathot avatar Sep 07 '23 09:09 emmanuelmathot

It is definitely possible to do this from a sample of rows, but, in this case, where the choices are defined by an extension, it would be better (I think) to add the sat extension json into the stac_extensions table. If it is there, the missing_queryables function should pick up the enum along with any other specific definitions directly from the extension definition.

Anything we would do to auto-create enums would either take a full table scan which could be very very slow or when only using a sample of rows, there is always the possibility that the enum would be missing some values.

I'm certainly open to a PR that adds this in, but it is probably not going to be something high on my priority list to work on.

bitner avatar Sep 07 '23 14:09 bitner