satpy icon indicating copy to clipboard operation
satpy copied to clipboard

Changes to Eumetsat L2 BUFR reader

Open samain-eum opened this issue 1 year ago • 9 comments

This PL carries the following feature changes:

  • Adapt the EUM BUFR reader to make it generic and no longer SEVIRI speficic (hence the name change)
  • Add functionality and configuration yaml file for the FCI BUFR products
  • Include a fix / workaround for the case a variable within a BUFR message is all set to fill values, which results in an unexpected array size as returned by eccodes

samain-eum avatar Oct 19 '23 08:10 samain-eum

@mraspaud : We have a question on what do for the resolution of wind products For the various dedicated variables, we chose to set it to "none", as a spatial resolution doesn't make sense for a wind vector. It shows up as such in the variable attributes imported into satpy. However we have also "common" variables, i.e. latitude and longitude, where the resolution key is defined as an array, with one element per product in the file_type key. It doesn't seem possible to mix different types for the resolution (it's an array, not a list). The solutions I have found are either:

  • Put a placeholder value for the AMV resolution (something that won't break anything, i.e. 0 or 1 perhaps)
  • Create an other kind of latitude / longitude bloc entry for resolution-less data. Same name, just different file types allocated, and resolution: ["none"].

Do you have any preference of advice?

samain-eum avatar Apr 12 '24 14:04 samain-eum

@mraspaud : We have a question on what do for the resolution of wind products For the various dedicated variables, we chose to set it to "none", as a spatial resolution doesn't make sense for a wind vector. It shows up as such in the variable attributes imported into satpy. However we have also "common" variables, i.e. latitude and longitude, where the resolution key is defined as an array, with one element per product in the file_type key. It doesn't seem possible to mix different types for the resolution (it's an array, not a list). The solutions I have found are either:

  • Put a placeholder value for the AMV resolution (something that won't break anything, i.e. 0 or 1 perhaps)
  • Create an other kind of latitude / longitude bloc entry for resolution-less data. Same name, just different file types allocated, and resolution: ["none"].

Do you have any preference of advice?

@mraspaud : We have a question on what do for the resolution of wind products For the various dedicated variables, we chose to set it to "none", as a spatial resolution doesn't make sense for a wind vector. It shows up as such in the variable attributes imported into satpy. However we have also "common" variables, i.e. latitude and longitude, where the resolution key is defined as an array, with one element per product in the file_type key. It doesn't seem possible to mix different types for the resolution (it's an array, not a list). The solutions I have found are either:

  • Put a placeholder value for the AMV resolution (something that won't break anything, i.e. 0 or 1 perhaps)
  • Create an other kind of latitude / longitude bloc entry for resolution-less data. Same name, just different file types allocated, and resolution: ["none"].

Do you have any preference of advice?

To add to this: what is actually the idea of having a list of resolutions for a datasets resolution variable for a dataset in the reader yaml-file? How would satpy know what resolution is the correct one when loading e.g. latitude if there is a list of three values? I first thought there was a relationship between the file_type list and the resolution list, but since equal lengths of these lists/arrays is not needed, that cannot be the case.

If you know the logic behind this @mraspaud or could point to the relevant code that would help us understand better what the best approach is for our use case.

strandgren avatar May 27 '24 10:05 strandgren

@strandgren Actually my idea of separate different lat/lon blocs for resolution-less products doesn't work. It seems satpy overloads the dataset info with the last bloc found in the yaml file and the same name. So it breaks when trying to load lat/lon for ASR for example. So I think I will instead use a dummy value of 0 in the yaml file, and replace it by 'none' when creating the attributes.

As for your question above, the resolution is currently linked to the file pattern itself, so we can't manage datasets with different resolutions within one file.

samain-eum avatar May 27 '24 12:05 samain-eum

That's a tough question! The resolution is indeed quite central in satpy, and I don't believe None is an option.

But just for curiosity, why doesn't resolution make sense for wind data? Is it like a continuous field? or is it motion vector derived from pixels at a certain resolution? (in which case maybe the original pixel resolution makes a little sense?)

Otherwise, you can always define custom data identification keys for your reader, like in here: https://github.com/pytroll/satpy/blob/main/satpy/etc/readers/sgli_l1b.yaml#L13-L31 And from there, maybe you can have a resolution-less one? I'm quite sure it will break somewhere, but maybe worth a try?

mraspaud avatar May 27 '24 13:05 mraspaud

Hi @mraspaud Wind vectors are produced by finding cloud features that are moving during successive images and a product is a "list" of winds with their lat/lon information, not a gridded product. So yes, resolution don't really make sense. I'm aware that a None (i.e. the python reserved type) may cause issues, that's why I opted for a "none" string instead. In the scope of our unit tests, it seems to work fine now. It just need dedicated lat/lon coordinates variables with the same value for the resolution. Maybe it's worth noting that for those product, the 'with_area_definition' flag when creating a scene instance is ignored, so we make no attempt at building an area definition (which requires a resolution). We are not aware of anything else that could break satpy.

This is how the definitions look like now for AMV:

  latitude_amv:
    name: latitude_amv
    key: '#1#latitude'
    resolution: none
    file_type: fci_l2_bufr_amv
    standard_name: latitude
    units: degree_north
    fill_value: -1.e+100
 
  longitude_amv:
    name: longitude_amv
    key: '#1#longitude'
    resolution: none
    file_type: fci_l2_bufr_amv
    standard_name: longitude
    units: degree_east
    fill_value: -1.e+100
 
  pressure:
    name: pressure
    long_name: Pressure of AMV feature
    standard_name: air_pressure_at_wind_level
    resolution: none
    file_type: fci_l2_bufr_amv
    key: '#1#pressure'
    units: Pa
    fill_value: -1.0e+100
    coordinates:
      - longitude_amv
      - latitude_amv

samain-eum avatar May 28 '24 13:05 samain-eum

Codecov Report

Attention: Patch coverage is 96.15385% with 8 lines in your changes missing coverage. Please review.

Project coverage is 95.97%. Comparing base (ab55c4a) to head (3b5c482). Report is 258 commits behind head on main.

Files Patch % Lines
satpy/readers/eum_l2_bufr.py 87.30% 8 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2603      +/-   ##
==========================================
+ Coverage   95.91%   95.97%   +0.05%     
==========================================
  Files         366      366              
  Lines       53504    53804     +300     
==========================================
+ Hits        51321    51639     +318     
+ Misses       2183     2165      -18     
Flag Coverage Δ
behaviourtests 4.03% <0.00%> (-0.01%) :arrow_down:
unittests 96.07% <96.15%> (+0.05%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jun 14 '24 08:06 codecov[bot]

Pull Request Test Coverage Report for Build 9512734622

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 197 of 210 (93.81%) changed or added relevant lines in 2 files are covered.
  • 42 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.02%) to 96.023%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/tests/reader_tests/test_eum_l2_bufr.py 142 146 97.26%
satpy/readers/eum_l2_bufr.py 55 64 85.94%
<!-- Total: 197 210
Files with Coverage Reduction New Missed Lines %
satpy/resample.py 42 88.74%
<!-- Total: 42
Totals Coverage Status
Change from base Build 9418285216: -0.02%
Covered Lines: 51591
Relevant Lines: 53728

💛 - Coveralls

coveralls avatar Jun 14 '24 08:06 coveralls

This one is now ready for review. In the end we decided to not specify a resolution in the yaml-files for the AMV data. We also kept a single definition of the latitude and longitude datasets that we use also for the AMV datasets. For these latitude and longitude datasets we have also assigned as many resolutions as file_types in orders to have a clear mapping of what the resolution is for a given file_type. For the AMV file_type we have set the latitude and longitude resolution to -1 since the list of resolutions does not support different data types (i.e. NoneType). Beyond the unit tests we have also tested this PR with real data and it appears to work well. See example below for MSG AMV data, adding the channel_id dataset, also allows to separate the AMV by channel (left: all channels, right: IR10.8 channel):

image

strandgren avatar Jun 14 '24 12:06 strandgren

Pull Request Test Coverage Report for Build 9516424732

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 201 of 209 (96.17%) changed or added relevant lines in 2 files are covered.
  • 42 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.008%) to 96.032%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/readers/eum_l2_bufr.py 55 63 87.3%
<!-- Total: 201 209
Files with Coverage Reduction New Missed Lines %
satpy/resample.py 42 88.74%
<!-- Total: 42
Totals Coverage Status
Change from base Build 9418285216: -0.008%
Covered Lines: 51595
Relevant Lines: 53727

💛 - Coveralls

coveralls avatar Jun 14 '24 13:06 coveralls

Pull Request Test Coverage Report for Build 10055342407

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 201 of 209 (96.17%) changed or added relevant lines in 2 files are covered.
  • 151 unchanged lines in 15 files lost coverage.
  • Overall coverage increased (+0.04%) to 96.075%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/readers/eum_l2_bufr.py 55 63 87.3%
<!-- Total: 201 209
Files with Coverage Reduction New Missed Lines %
satpy/tests/reader_tests/gms/test_gms5_vissr_navigation.py 1 99.3%
satpy/readers/generic_image.py 1 97.7%
satpy/enhancements/init.py 1 99.07%
satpy/readers/init.py 2 98.68%
satpy/readers/viirs_edr.py 2 98.92%
satpy/tests/enhancement_tests/test_enhancements.py 2 99.44%
satpy/readers/hrit_jma.py 2 98.64%
satpy/tests/reader_tests/test_ami_l1b.py 3 98.17%
satpy/readers/ami_l1b.py 4 97.32%
satpy/readers/hdf5_utils.py 5 92.77%
<!-- Total: 151
Totals Coverage Status
Change from base Build 9418285216: 0.04%
Covered Lines: 51868
Relevant Lines: 53987

💛 - Coveralls

coveralls avatar Jul 23 '24 08:07 coveralls