MintPy icon indicating copy to clipboard operation
MintPy copied to clipboard

Add function to create CF-compliant arrays/metadata for HDF5 stacks

Open scottstanie opened this issue 2 years ago • 2 comments

Description of proposed changes

Start of the implementation described here: https://github.com/insarlab/MintPy/discussions/1016 to make mintpy HDF5 stacks readable by gdal/xarray/qgis.

I'll need to think of a way we can smoothly incorporate this into the prep_ scripts without interruption.

As a side note:

Maybe we could use "datetime" instead? Neither "date" nor "time" feels accurate, given that we handle both spaceborne and airborne data

For now I still have time as the possible attribute for the date/datetime stacks just because that's what the CF-conventions suggested. They also generally have dates/datetime, but use time as the standard variable name (even if you have daily granularity). Also, it currently has units=f"days since {str(date_arr[0])}" as the time units, but this can be seconds for intra-day stacks (e.g. for @taliboliver 's deltaX data). But i don't know what modifications he made; i've only seen that there's a date dataset usually.

Reminders

  • [ ] Fix #xxxx
  • [ ] Pass Pre-commit check (green)
  • [ ] Pass Codacy code review (green)
  • [ ] Pass Circle CI test (green)
  • [ ] Make sure that your code follows our style. Use the other functions/files as a basis.
  • [ ] If modifying functionality, describe changes to function behavior and arguments in a comment below the function declaration.
  • [ ] If adding new functionality, add a detailed description to the documentation and/or an example.

scottstanie avatar Aug 18 '23 23:08 scottstanie

It's currently failing on one of the integration tests in the readfile.read_hdf5_file. It would be good to add a unit test for this, since the error is saying that data is never assigned. There's only checks for 2D and higher dimensions..

        # 2D dataset
        if ds.ndim == 2:
            ...
            data = ...

so it's likely one of the 1D or 0D datasets is tripping it up (but that should be caught separately in read_hdf5_file i think, since there shouldn't be a code path to an undefined var)

scottstanie avatar Aug 19 '23 18:08 scottstanie

Thank you @scottstanie for this exciting PR!!!

I am trying to review this PR, but it does not seem easy to see what data and metadata have been added in the new format. Here are two questions for you:

  1. I assume the San Francisco Bay dataset from ARIA can be used to test this new capability, right?
  2. To document this new format, could you provide a minimal example for creating and examining the old/new format of the HDF5 file? I was relying on info.py, but it does not seem to work with the new format yet.

Also, it currently has units=f"days since {str(date_arr[0])}" as the time units, but this can be seconds for intra-day stacks (e.g. for @taliboliver 's deltaX data). But i don't know what modifications he made; i've only seen that there's a date dataset usually.

For the intra-day stacks, @taliboliver uses the YYYYMMDDTHHMM format in the date dataset, instead of the usual YYYYMMDD format, and modified the code throughout mintpy to automatically identify this difference while reading it.

I am not familiar enough with the new changes in this PR yet to help the choice here. Having the above two questions answered would help.

yunjunz avatar Aug 31 '23 10:08 yunjunz