ascat Rewrite cell/swath xarray readers as MultiFileHandlers

Rewrite cell/swath xarray readers as MultiFileHandlers

Open claytharrison opened this issue 10 months ago • 1 comments

This pull request aims to reimplement the reading/merging logic for swath and cell files in the structure established by MultiFileHandler/ChronFiles/etc in the file_handling module.

On this commit, readers for cell files are implemented (RaggedArray and OrthoMulti). The most basic method of operation goes something like:

from ascat.read_native.cell_collection import RaggedArrayFiles, OrthoMultiArrayFiles
contiguous_ra_source = "/path/to/contiguous/sig0_12.5/metop_a"
indexed_ra_source = "/path/to/indexed/sig0_12.5/metop_a"
multisat_ra_source = "/path/to/indexed/sig0_12.5/"
orthomulti_source = "/path/to/era5_land_2023/"
orthomulti_grid = "/path/to/era5_land_2023/grid.nc"

# amazon chunk
# you can also query by list of location_id, cell number, or lon/lat coords
bbox = (-7, -4, -69, -65)

contiguous_ra_files = RaggedArrayFiles(contiguous_ra_source, product_id="sig0_12.5") 
indexed_ra_files = RaggedArrayFiles(indexed_ra_source, product_id="sig0_12.5")

# right now we just use the "all_sats" parameter to indicate if the files are nested within metop_a/metop_b/metop_c directories underneath
# the root dir. This is of course not general or ideal.
multisat_ra_files = RaggedArrayFiles(multisat_ra_source, product_id="sig0_12.5", all_sats=True)

# for orthomulti right now you just pass the grid file path as an argument and it will generate a pygeogrids object from that.
# the product_id doesn't do anything in this case.
orthomulti_files = OrthoMultiArrayFiles(orthomulti_source, product_id="this_doesnt_matter_in_this_case", grid=orthomulti_grid)

# extract the data

contiguous_ra_ds = contiguous_ra_files.extract(bbox=bbox)
indexed_ra_ds = indexed_ra_files.extract(bbox=bbox)
# ^ these two should be the same, since contiguous RAs are converted to indexed before merging

multisat_ra_ds = multisat_ra_files.extract(bbox=bbox)

orthomulti_ds = orthomulti_files.extract(bbox=bbox)

To do:

~Add swath file reader~ Finish swath file reader
Find a robust method of handling product-specific information like grids, etc., including a way for users to provide that themselves. For the cell reader we only really need to pass the grid, but for the swath reader this will get more complicated
Add ability to write out according to different cell scheme (any cell scheme)
Try integration with regrid applications, make sure that still works nicely.
Rename things better
whatever else is missing compared to the old version

Apr 24 '24 12:04 claytharrison

I added a basic Swath reader but nothing for handling specific products yet. For now you can steal the information for a given product from xarray_io.py.

It tries to implement a spatial filter for the results of the time-based file search, to relatively quickly exclude unnecessary swath files from reading and merging. The concept was graciously stolen from a script of Pavan's. It seems like it works but I haven't done proper testing yet.

Using it should go something like -

from ascat.read_native.swath_collection import SwathFile
from ascat.read_native.swath_collection import SwathGridFiles
from fibgrid.realization import FibGrid

swath_path = "tests/ascat_test_data/hsaf/h129/swaths"
grid = FibGrid(6.25)
sf = SwathGridFiles(
    swath_path,
    cls=SwathFile,
    fn_templ="W_IT-HSAF-ROME,SAT,SSM-ASCAT-METOP{sat}-6.25-H129_C_LIIB_{date}_{placeholder}_{placeholder1}____.nc",
    sf_templ={"year_folder": "{year}"},
    grid=grid,
    fn_read_fmt= lambda timestamp: {
        "date": timestamp.strftime("%Y%m%d*"),
        "sat": "[ABC]",
        "placeholder": "*",
        "placeholder1": "*"
    },
    sf_read_fmt = lambda timestamp:{
        "year_folder": {
            "year": f"{timestamp.year}"
        },
    },
)
files = sf.search_period(
    datetime(2021, 1, 15),
    datetime(2021, 1, 30),
    date_field_fmt="%Y%m%d%H%M%S"
)
bbox=(-90, -4, -70, 20)

merged_ds = sf.extract(
    datetime(2021, 1, 15),
    datetime(2021, 1, 30),
    bbox = bbox,
    date_field_fmt="%Y%m%d%H%M%S"
)

Apr 25 '24 13:04 claytharrison

ascat ascat copied to clipboard

Rewrite cell/swath xarray readers as MultiFileHandlers

ascat
ascat copied to clipboard