satpy icon indicating copy to clipboard operation
satpy copied to clipboard

Update FCI L1c reader to work with remote file systems

Open pnuu opened this issue 1 year ago • 4 comments

This PR adapts the FCI L1c reader so that it is able to read data from remote locations using fsspec.

fnames = ["simplecache::s3://satellite-data-fci-test-data-2022/*DEV_20170920112*.nc"]
scn = Scene(reader='fci_l1c_nc', filenames=fnames)
scn.load(['ir_123'])
scn.save_datasets()

See the documentation for more information on accessing remote data.

  • [x] Tests added
  • [x] Fully documented

pnuu avatar Aug 18 '22 07:08 pnuu

Codecov Report

Merging #2182 (6374504) into main (acd0745) will decrease coverage by 0.01%. The diff coverage is 90.80%.

:exclamation: Current head 6374504 differs from pull request most recent head 7be69e8. Consider uploading reports for the commit 7be69e8 to get more accurate results

@@            Coverage Diff             @@
##             main    #2182      +/-   ##
==========================================
- Coverage   94.03%   94.01%   -0.02%     
==========================================
  Files         289      289              
  Lines       44564    44612      +48     
==========================================
+ Hits        41904    41944      +40     
- Misses       2660     2668       +8     
Flag Coverage Δ
behaviourtests 4.74% <0.00%> (-0.01%) :arrow_down:
unittests 94.68% <90.80%> (-0.02%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
satpy/readers/netcdf_utils.py 95.42% <75.86%> (-4.58%) :arrow_down:
satpy/tests/reader_tests/test_fci_l1c_nc.py 99.70% <97.14%> (-0.30%) :arrow_down:
satpy/readers/fci_l1c_nc.py 98.24% <100.00%> (+0.08%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Aug 18 '22 07:08 codecov[bot]

Unless there is a public place with FCI data served over S3, I can't test the remote reading functionality. However, for local processing, there is a severe regression in performance. Generating three channels and one composite from compressed full disc on my workstation, without resampling: without this PR, 38.7 seconds. With this PR, 1 minutes 57 seconds. That's roughly 3× longer. This PR should make sure that there is no regression in the performance for local processing, which will remain the most common use case.

At the same time, RAM usage is down from 4.3 GB to 2.9 GB, which is great, but in this case I believe wall clock time will worry users more.

Satpy main:


        Command being timed: "python /home/gholl/checkouts/protocode/fci-true-color.py"
        User time (seconds): 88.84
        System time (seconds): 3.90
        Percent of CPU this job got: 239%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:38.67
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 4329920
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 616257
        Voluntary context switches: 16582
        Involuntary context switches: 25305
        Swaps: 0
        File system inputs: 245656
        File system outputs: 109240
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

With this PR:

        Command being timed: "python /home/gholl/checkouts/protocode/fci-true-color.py"
        User time (seconds): 169.68
        System time (seconds): 2.80
        Percent of CPU this job got: 147%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:57.20
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2893340
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 366179
        Voluntary context switches: 21684
        Involuntary context switches: 19337
        Swaps: 0
        File system inputs: 246760
        File system outputs: 109240
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

gerritholl avatar Aug 18 '22 07:08 gerritholl

Coverage Status

Coverage decreased (-0.01%) to 94.626% when pulling 6374504a854177c16a22c592cc33212ed330da38 on pnuu:feature-fci-fsfile into acd074530f1eeaa860c1d18de22ecbfd75be8166 on pytroll:main.

coveralls avatar Aug 18 '22 07:08 coveralls

Thanks for testing! I noticed the same increase in processing time for local files, but didn't yet have time to look into it. My guess is some da.array() calls or such are causing unnecessary computes.

pnuu avatar Aug 18 '22 07:08 pnuu

FYI just to put this somewhere, I found this: https://github.com/cedadev/S3-netcdf-python

djhoese avatar Oct 18 '22 16:10 djhoese

And...https://docs.unidata.ucar.edu/nug/current/nczarr_head.html

djhoese avatar Oct 18 '22 16:10 djhoese