pystac icon indicating copy to clipboard operation
pystac copied to clipboard

Add File Info Extension v1.0.0 -> v2.0.0 migration

Open duckontheweb opened this issue 3 years ago • 7 comments

Support for v2.0.0 of the File Info Extension was added in #442, but the migrate hook was not updated to include a migration from v1.0.0 to the new version. That hook should be updated to include the migration.

Based on my understanding of the changes between versions, the following fields should be migrated to Raster Bands Object fields (dropping the file: prefix as well):

  • file:bits_per_sample
  • file:data_type
  • file:nodata
  • file:unit

@emmanuelmathot @matthewhanson @m-mohr Feel free to correct this if I got anything wrong.

duckontheweb avatar Jun 23 '21 20:06 duckontheweb

I am not sure we can easily transfer those information that were describing the file to one or more raster bands. There are several cases that should be supported and thus several implementation rules must be defined. A basic rule could be to apply the file fields to all raster bands corresponding fields. If the assets of the original item does not describe any raster band, is a default raster band created?

emmanuelmathot avatar Jun 23 '21 20:06 emmanuelmathot

A basic rule could be to apply the file fields to all raster bands corresponding fields.

I was thinking that if raster:bands is defined we would check if the field is already defined in any of the Raster Band Objects. If it is, we would take no action since we would not want to overwrite a value defined by the Raster Extension. If it is not defined, then we would populate that field with the file:* value.

If the assets of the original item does not describe any raster band, is a default raster band created?

Yeah, this is a tricky. I was thinking that we should create a default Raster Band Object so that we don't lose the information. However, that could be misleading if an asset has more than 1 band but we create only a single Raster Band Object. It also doesn't seem possible to discover the number of Raster Band Objects that should be created without opening the file, which is not something we want to do here. What is your opinion on this @emmanuelmathot?

duckontheweb avatar Jun 24 '21 02:06 duckontheweb

That is really tricky. We should avoid add raster bands to non raster assets. I would create a default raster band so that we do not loose the information . But I would add some kind of filters to be sure that the asset is a potential raster (e.g. mime-type, file extension)

emmanuelmathot avatar Jun 24 '21 05:06 emmanuelmathot

We should avoid add raster bands to non raster assets. I would create a default raster band so that we do not loose the information . But I would add some kind of filters to be sure that the asset is a potential raster (e.g. mime-type, file extension)

Are there legitimate cases in File Info Extension v1.0.0 where someone would have used these fields for something besides a raster? It seems like these fields were always implicitly intended for raster files and that we just made that more explicit by splitting the extensions (I may not be aware of the full evolution of these, though, so this could be off-base). If this is true, we may just want to give the publisher the benefit of the doubt and always create the Raster Band rather than trying to filter on all of the possible MIME types and extensions. However, if there are other legitimate uses this may be our only viable option.

cc @m-mohr

duckontheweb avatar Jun 28 '21 18:06 duckontheweb

I guess I could construct use cases, but I doubt someone has actually implemented file for anything other than raster.

I'm not sure though how to convert to raster:bands. Is it always the case that a global e.g. file:nodata, can be converted to a "single band" raster:bands containing file:nodata? We are assuming it is one band, but that may not be true.

m-mohr avatar Jun 28 '21 23:06 m-mohr

Given some of the uncertainties around handling the case where no Raster Band exist it might be best not to attempt any migration in that case.

In that case, the logic would be to migrate file:bits_per_sample, file:data_type, file:nodata, file:unit as follows:

  1. If no Raster Bands already exist on the object, do nothing
  2. If Raster Bands do exist, then for each of these properties: i) If any of the Raster Band objects contain the field already, do nothing ii) If none of the Raster Band object contain the field, add it to each existing Raster Band

@m-mohr @emmanuelmathot How do you feel about this approach?

duckontheweb avatar Aug 17 '21 18:08 duckontheweb

Maybe additionally: If no Raster Bands already exist, create as many as eo:bands exist and add the field to each of them.

But maybe I may not even try to migrate it in JS and just handle leave implementation on v1.0.0.

m-mohr avatar Aug 17 '21 20:08 m-mohr

I am wondering if this is still necessary given its age? I would think that most providers have probably moved to the newer version over the last 2 years.

jsignell avatar Jun 01 '23 19:06 jsignell

Let's keep the issue open but not roadmap it. IMO dropping support for an old extension version should be avoided if possible, at least after v1 of that extension.

I'm soft-banking on our (as of yet undeveloped) solution for #448 being so good that it makes issues like this trivial 🤞🏼.

gadomski avatar Jun 01 '23 20:06 gadomski

FYI, some implementations still use v1 as some removed v1 functionality can't easily be mapped to other extensions (e.g. global no data value). But that also means the migration path is not trivial.

m-mohr avatar Jun 01 '23 22:06 m-mohr

With the bands PR this can now indeed be solved :-)

m-mohr avatar Sep 28 '23 18:09 m-mohr

#1265 makes it so if you load an object with any of those removed fields you get a warning. Not sure if that is enough to close this ticket, but maybe it is enough until raster bands is fully baked.

jsignell avatar Oct 17 '23 18:10 jsignell

I think it's good enough. If folks start complaining about the migration w/ warnings, we can open a ticket to actually move things over once raster bands is baked. But that's fine as a new issue if it arises, IMO.

gadomski avatar Oct 17 '23 19:10 gadomski