satpy icon indicating copy to clipboard operation
satpy copied to clipboard

Add `get_area_def` to cf reader

Open BENR0 opened this issue 4 years ago • 27 comments

This adds add area definition support for the cf reader. I fixed the tests and added a test area in geos projection as well as some preliminary assertions for testing the area. I think the tests should be improved and extended to cover other areas. Currently the extent assertion is failing because of https://github.com/pytroll/pyresample/issues/355.

  • [x] Closes #1672
  • [ ] Tests added

BENR0 avatar May 27 '21 14:05 BENR0

Pyresample 1.20. 0 is being deployed right now. It looks like our test environment uses pyresample from conda-forge so we'll have to wait for that package to be pull requested, merged, released, and synced to the package repository.

djhoese avatar Jun 04 '21 15:06 djhoese

This is failing, so it'll have to go to the next release

mraspaud avatar Jun 04 '21 19:06 mraspaud

@zxdawn @BENR0 @mraspaud So I just ran this locally with pyresample main and it does actually fail even though the CI jobs here aren't pulling in the right version:

>           assert expected_area.area_extent == actual_area.area_extent
E           AssertionError: assert (339045.5577,... 4803645.4685) == (4803645.4685..., 339045.5577)
E             At index 0 diff: 339045.5577 != 4803645.468500001
E             Use -v to get the full diff

satpy/tests/reader_tests/test_satpy_cf_nc.py:144: AssertionError

Edit: Note this is way better than the results from the old pyresample version:

 >           assert expected_area.area_extent == actual_area.area_extent
E           AssertionError: assert (339045.5577,... 4803645.4685) == (171902444919...3027029152.95)
E             At index 0 diff: 339045.5577 != 171902444919656.84
E             Use -v to get the full diff

djhoese avatar Jun 05 '21 01:06 djhoese

It seems this was due to a switch of x and y in the test setup. Now it passes. One caveat though is that I had to add pytest.approx to the assert statement because the upper right y coordinate was just a tiny bit off from the expected value.

BENR0 avatar May 13 '22 12:05 BENR0

You may need to merge main into this branch. It seems github actions doesn't like the old workflow file (I hope that's the problem at least).

djhoese avatar May 13 '22 14:05 djhoese

Indeed I totally forgot to do that.

BENR0 avatar May 13 '22 15:05 BENR0

Codecov Report

Merging #1695 (2b2048b) into main (ff54a74) will increase coverage by 0.08%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1695      +/-   ##
==========================================
+ Coverage   94.05%   94.13%   +0.08%     
==========================================
  Files         290      293       +3     
  Lines       44639    45079     +440     
==========================================
+ Hits        41987    42437     +450     
+ Misses       2652     2642      -10     
Flag Coverage Δ
behaviourtests 4.68% <0.00%> (-0.04%) :arrow_down:
unittests 94.79% <100.00%> (+0.07%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
satpy/readers/satpy_cf_nc.py 97.97% <100.00%> (+0.17%) :arrow_up:
satpy/tests/reader_tests/test_satpy_cf_nc.py 100.00% <100.00%> (ø)
satpy/writers/geotiff.py 93.75% <0.00%> (ø)
satpy/tests/test_dataset.py 100.00% <0.00%> (ø)
satpy/readers/seviri_l1b_native_hdr.py 100.00% <0.00%> (ø)
satpy/tests/reader_tests/test_seviri_l1b_native.py 100.00% <0.00%> (ø)
satpy/readers/mws_l1b.py 98.51% <0.00%> (ø)
satpy/tests/reader_tests/test_mws_l1b_nc.py 100.00% <0.00%> (ø)
satpy/readers/pmw_channels_definitions.py 97.60% <0.00%> (ø)
satpy/tests/test_writers.py 98.99% <0.00%> (+0.02%) :arrow_up:
... and 3 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar May 13 '22 16:05 codecov[bot]

Coverage Status

Coverage increased (+0.07%) to 94.74% when pulling 2b2048b30598887f3c5aa4a78fcc7a4a2a0c630d on BENR0:feat_add_get_area_def_to_cf_reader into ff54a74be8c5a51cc5347285f680de66f1f9807e on pytroll:main.

coveralls avatar May 13 '22 16:05 coveralls

@djhoese @mraspaud did you have time to look at this?

BENR0 avatar May 24 '22 07:05 BENR0

@BENR0 sorry for the late reply.

It looks good for the area case, but what happens if we are working with swath definitions (so we just have lons and lats)? will the reading crash then?

mraspaud avatar May 24 '22 07:05 mraspaud

I think that depends if this is covered by the implementations in pyresample. I can have a look at it even though I am not that familiar with the code. Maybe @TomLav can help because most of the code is from him.

BENR0 avatar May 24 '22 07:05 BENR0

@BENR0 I just meant to verify that the changes you make do not change the behaviour of the lon/lat case, no need to implement it here. So if you can see that the lon/lat case still works as before, then this is good as it is now

mraspaud avatar May 24 '22 07:05 mraspaud

@mraspaud do I understand correctly that you mean if a netcdf written from a Satpy Scene which only has a swath definition is also read correctly?

BENR0 avatar May 24 '22 07:05 BENR0

I haven't checked recently, but I would expect so.

mraspaud avatar May 24 '22 07:05 mraspaud

Although it might be the same problem as you are fixing here that the geolocation isn't actually read.

mraspaud avatar May 24 '22 07:05 mraspaud

Thanks for tagging me. I am maybe wondering if cf_reader should offer a richer interface to pyresample's from_cf() routine? For example, from_cf() offers the option to request the area_def attached to a specific netCDF variable which is not available from the cf_reader() at present. To specify variable= in from_cf() can useful both if there are several area_defs in a netCDF file (not illegal, but not often) and if the CF encoding is not fully standard (in which case specifying variable= helps from_cf() to locate the area_def). But I do not know if the cf_reader workflow only looks at file-level area_defs or variable-level.

TomLav avatar May 24 '22 08:05 TomLav

Indeed I think that would be something to think about to make it more generic in the future. Currently I think the reader is mainly for reading files written with Satpy which I think does not produce different area_defs in one file. So my opinion is to make this another PR (respectively create an feature issue for it).

BENR0 avatar May 24 '22 09:05 BENR0

Thanks. Your approach is fine with me.

TomLav avatar May 24 '22 12:05 TomLav

@mraspaud regarding the swath definition case: You were right implementing the get_area_def the way I did breaks this case. The reason is that the CF utils in pyresample implemented by @TomLav do not cover that case. Without the get_area_def method implemented in the specific reader Satpy "falls back" to creating the area from the lat/lon grids which is implemented in the FileYAMLReader. For a quick work around I added ValueError to the try except in order to fall back to the yaml reader creating the area def. While this works I think this is basically asking for trouble since ValueError is pretty generic and might lead to more problems that it solves.

My opinion about this is that handling 2D lat/lon coordinate grids should also be implemented in the CF utils in pyresample because CF allows this kind of coordinates and therefore AreaDefinition.from_cf should be able to handle this. Apart from this obviously there is a unit test missing in the cf reader for the 2D lat/lon grid case. Are there some helper functions somewhere already to create a Satpy dataset with a swath definition which could be reused?

BENR0 avatar May 25 '22 09:05 BENR0

Hello. I think it is correct of pyresample's from_cf() to not support lat/lon swath data (if this is what was failing now). I think AreaDefinition is only for grids based on Earth projections, while the geometry information of lat/lon swath would use SwathDefinition object.

I don't know satpy. Can you link me to the current implementation of this feature in satpy, so that I can see how it is working?

TomLav avatar May 25 '22 12:05 TomLav

You are correct that 2D lat/lon grids are handled by SwathDefinition. The formulation with AreaDefinition I used was misleading. What I was trying to say is that it would be good to have a from_cf method that can handle both (so returns either AreaDefinition or SwathDefinition objects depending on the case).

The logic for the creation of the area the satpy_cf_nc reader used before the changes proposed in this PR is located in the FileYAMLReader. I think https://github.com/pytroll/satpy/blob/cd057caee2b48f718eff2bc3b384e6286ced420b/satpy/readers/yaml_reader.py#L831 might be a starting point if you want to have a look at it. The try/except handles the fall back to creating a SwathDefinition if the get_area_def method is not implemented in the reader.

BENR0 avatar May 25 '22 13:05 BENR0

I see. I'll give it some thoughts. A first step would be to have a from_cf() method for pyresample's Geometry, that would default returning a SwathDefinition if no valid CF grid construct was detected.

TomLav avatar May 25 '22 15:05 TomLav

I'm hesitant to want to add SwathDefinition support directly to from_cf in pyresample. Does the CF standard define standard ways of defining arrays of lon/lat coordinates? What do we do if there are multiple? This seems like something will be different for every file scheme (thinking of all of the readers in Satpy that have different names for their lon/lat variables). ...actually CF doesn't standardize variable names so it would have to be done by standard_name at least.

djhoese avatar May 25 '22 15:05 djhoese

@BENR0 @TomLav Any ideas for my previous questions:

I'm hesitant to want to add SwathDefinition support directly to from_cf in pyresample. Does the CF standard define standard ways of defining arrays of lon/lat coordinates? What do we do if there are multiple? This seems like something will be different for every file scheme (thinking of all of the readers in Satpy that have different names for their lon/lat variables). ...actually CF doesn't standardize variable names so it would have to be done by standard_name at least.

This is still my opinion, that pyresample's from_cf should not support generating SwathDefinitions unless there is a clear definition of how lon/lats are to be defined and referenced from CF standard NetCDF variables.

Unless I'm reading the code incorrectly, the previous behavior always tried to generate a SwathDefinition from the "coordinates" defined in the YAML. This is the same as any polar-orbiter reader in Satpy. I think you should try/except in get_area_def and raise a NotImplementedError instead of the ValueError that is generated now (and remove the new ValueError check in the YAML reader). I know NotImplementedError may seem like the wrong thing to do here, but historically this is what we've done and catching ValueError could cause other issues in the future. For example, an actual ValueError trying to generate an AreaDefinition that should have been possible to generate.

djhoese avatar Aug 03 '22 14:08 djhoese

@djhoese I changed the try/except.

I am good with not implementing something regarding SwathDefinitions in from_cf. The way it is now netcdf files written by Satpy can be read either way which is all I need for now. I agree that the more general approach has to many open questions/is to difficult and can/should be something for another PR if it is desired in the future.

BENR0 avatar Aug 05 '22 09:08 BENR0

pre-commit.ci autofix

djhoese avatar Aug 05 '22 14:08 djhoese

This looks pretty good to me. I'm not sure how I feel about the warning as in some cases that may be the expected and intended path forward for the user. For example, if they configured coordinates in the YAML then they probably know that an AreaDefinition can't be made from the NetCDF file. @mraspaud what do you think?

djhoese avatar Aug 05 '22 14:08 djhoese

@djhoese @mraspaud what's the status on merging this?

BENR0 avatar Aug 16 '22 09:08 BENR0

@BENR0 could you check the SwathDef test?

mraspaud avatar Aug 23 '22 12:08 mraspaud

@mraspaud I added the test for the SwathDefinition. It seems to me that the this test module could use some refactoring because there is some repetitive code in setting up datasets for different tests. To keep it inline with the overall style I also added duplicate code for generating a dataset. I am not sure but maybe this could all be moved into the setup (setting up a Scene with a dataset for each of SwathDefinition, AreaDefinition and for prefix testing and then in the test just load the needed dataset?).

Also I think with #2176 the instrument attribute setting can be removed for the dataset writing.

BENR0 avatar Aug 24 '22 13:08 BENR0