geo-deep-learning Define and Store metadata in STAC

Currently, training is performed on a list of GeoTIFF input images using reference data in GeoPackage files. That list of inputs is stored in csv files. For the results we store just the weights of our model (.pth file).

To make our models interoperable, we need to write out the model together with related weights; those items are our final shareable outputs. Also, should we care to implement checks on whether a particular dataset is amenable to inference using a given model, we need to store all inputs somewhere.

Initially we thought of using HDF to store both the inputs to and outputs of our models. It now appears one of the STAC extensions might be a more logical approach, as STAC is much more web-friendly than HDF.

Feb 05 '19 19:02 ymoisan

Mandatory information to store with the model, for re-usability:

Weights (.pth)
Model definition (e.g. Unet model)
Task type (e.g. classification or semantic segmentation)
Number of classes and surely their definition (e.g. 1-Vegetation, 2- Lake, 3- Building, etc.)
Number of band used for training and their definition (e.g. 4 bands: R-G-B-PIR);
- The definition should describe the source of each band:
  - Sensor type (e.g. Satellite, LiDAR, aerial photos, radar, etc.)
  - Acquisition date
  - Wavelength (if applicable)
  - Preprocess (if applicable)
Spatial resolution to which the training was conducted
Geographic location where the training/validation and tests were conducted. (e.g. bounding box or footprint, maybe?)

Optional information to store:

Training and validation accuracy
Training parameters (e.g. learning rate, # of epoch, class weights, etc.)

Feb 06 '19 15:02 mpelchat04

A nice way of validating if inputs are applicable to a given model implemented as a decorator : see "input validation" in A comprehensive guide to putting a machine learning model in production using Flask, Docker, and Kubernetes.

Feb 07 '19 21:02 ymoisan

If we wanted to devise some kind of standard for model interoperability around HDF5, we would likely come up with a HDF5 product definition. Interesting excerpts from [HDF Product Designer](https://wiki.earthdata.nasa.gov/display/HPD/HDF+Product+Designer ++):

The Hierarchical Data Format (HDF5) provides a flexible container that supports groups and datasets, each of which can have attributes. In many ways, HDF5 is similar to a directory structure in a file and, like directory structures, the same data can be structured and annotated in many ways. This flexibility empowers HDF5 users to arrange data in ways that make sense to them. However, it can make it difficult to share data ... Many communities have successfully addressed this problem by creating conventional structures and annotations for data in HDF5. This approach depends on data files (e.g., products) that carefully follow these conventions. A HDF5 product is the content that should exist in a single HDF5 file. This content is defined by the HDF5 objects (groups, attributes, datasets), their names, the hierarchies they create (links and references), and attribute values. Dataset values are typically not stored in such files (unless they qualify as metadata) thus this software cannot be used as a data server. Once completed, a HDF5 product is replicated in many files (commonly on the order of tens of thousands or more) and filled with real data.

How would the use of HDF5 help us in forming totally independent DL containers that would contain all the information needed for interoperability ? Could we implement something in relation to "standardised environments" as per OGC Testbed 14 ?

Feb 08 '19 19:02 ymoisan

How well does HDF5 play with Big Data infrastructures and OGC services like WCS ? Could the H5Server be useful ?

Feb 08 '19 21:02 ymoisan

Could we integrate STAC fields ?

Apr 05 '19 17:04 ymoisan

deepdish ? torch hdf5 ?

Jul 16 '19 16:07 ymoisan

EO profile of STAC includes items such as sun azimuth and elevation : https://github.com/radiantearth/stac-spec/blob/master/extensions/eo/schema.json. Type 20170831_162740_ssc1d1 in your browser search bar and you'll en up here :

All we need is there...

I suggest we investigate creating STAC Items of the label extension type. Note : models per se are not STAC Items for now. I think there is an opportunity for us to think about how we could make that happen.

Aug 01 '19 20:08 ymoisan

geo-deep-learning geo-deep-learning copied to clipboard

Define and Store metadata in STAC

Mandatory information to store with the model, for re-usability:

Optional information to store:

geo-deep-learning
geo-deep-learning copied to clipboard