ascat
ascat copied to clipboard
Rewrite cell/swath xarray readers as MultiFileHandlers
This pull request aims to reimplement the reading/merging logic for swath and cell files in the structure established by MultiFileHandler
/ChronFiles
/etc in the file_handling
module.
On this commit, readers for cell files are implemented (RaggedArray and OrthoMulti). The most basic method of operation goes something like:
from ascat.read_native.cell_collection import RaggedArrayFiles, OrthoMultiArrayFiles
contiguous_ra_source = "/path/to/contiguous/sig0_12.5/metop_a"
indexed_ra_source = "/path/to/indexed/sig0_12.5/metop_a"
multisat_ra_source = "/path/to/indexed/sig0_12.5/"
orthomulti_source = "/path/to/era5_land_2023/"
orthomulti_grid = "/path/to/era5_land_2023/grid.nc"
# amazon chunk
# you can also query by list of location_id, cell number, or lon/lat coords
bbox = (-7, -4, -69, -65)
contiguous_ra_files = RaggedArrayFiles(contiguous_ra_source, product_id="sig0_12.5")
indexed_ra_files = RaggedArrayFiles(indexed_ra_source, product_id="sig0_12.5")
# right now we just use the "all_sats" parameter to indicate if the files are nested within metop_a/metop_b/metop_c directories underneath
# the root dir. This is of course not general or ideal.
multisat_ra_files = RaggedArrayFiles(multisat_ra_source, product_id="sig0_12.5", all_sats=True)
# for orthomulti right now you just pass the grid file path as an argument and it will generate a pygeogrids object from that.
# the product_id doesn't do anything in this case.
orthomulti_files = OrthoMultiArrayFiles(orthomulti_source, product_id="this_doesnt_matter_in_this_case", grid=orthomulti_grid)
# extract the data
contiguous_ra_ds = contiguous_ra_files.extract(bbox=bbox)
indexed_ra_ds = indexed_ra_files.extract(bbox=bbox)
# ^ these two should be the same, since contiguous RAs are converted to indexed before merging
multisat_ra_ds = multisat_ra_files.extract(bbox=bbox)
orthomulti_ds = orthomulti_files.extract(bbox=bbox)
To do:
- ~Add swath file reader~ Finish swath file reader
- Find a robust method of handling product-specific information like grids, etc., including a way for users to provide that themselves. For the cell reader we only really need to pass the grid, but for the swath reader this will get more complicated
- Add ability to write out according to different cell scheme (any cell scheme)
- Try integration with regrid applications, make sure that still works nicely.
- Rename things better
- whatever else is missing compared to the old version
I added a basic Swath reader but nothing for handling specific products yet. For now you can steal the information for a given product from xarray_io.py
.
It tries to implement a spatial filter for the results of the time-based file search, to relatively quickly exclude unnecessary swath files from reading and merging. The concept was graciously stolen from a script of Pavan's. It seems like it works but I haven't done proper testing yet.
Using it should go something like -
from ascat.read_native.swath_collection import SwathFile
from ascat.read_native.swath_collection import SwathGridFiles
from fibgrid.realization import FibGrid
swath_path = "tests/ascat_test_data/hsaf/h129/swaths"
grid = FibGrid(6.25)
sf = SwathGridFiles(
swath_path,
cls=SwathFile,
fn_templ="W_IT-HSAF-ROME,SAT,SSM-ASCAT-METOP{sat}-6.25-H129_C_LIIB_{date}_{placeholder}_{placeholder1}____.nc",
sf_templ={"year_folder": "{year}"},
grid=grid,
fn_read_fmt= lambda timestamp: {
"date": timestamp.strftime("%Y%m%d*"),
"sat": "[ABC]",
"placeholder": "*",
"placeholder1": "*"
},
sf_read_fmt = lambda timestamp:{
"year_folder": {
"year": f"{timestamp.year}"
},
},
)
files = sf.search_period(
datetime(2021, 1, 15),
datetime(2021, 1, 30),
date_field_fmt="%Y%m%d%H%M%S"
)
bbox=(-90, -4, -70, 20)
merged_ds = sf.extract(
datetime(2021, 1, 15),
datetime(2021, 1, 30),
bbox = bbox,
date_field_fmt="%Y%m%d%H%M%S"
)