satpy
satpy copied to clipboard
Update FCI L1c reader to work with remote file systems
This PR adapts the FCI L1c reader so that it is able to read data from remote locations using fsspec
.
fnames = ["simplecache::s3://satellite-data-fci-test-data-2022/*DEV_20170920112*.nc"]
scn = Scene(reader='fci_l1c_nc', filenames=fnames)
scn.load(['ir_123'])
scn.save_datasets()
See the documentation for more information on accessing remote data.
- [x] Tests added
- [x] Fully documented
Codecov Report
Merging #2182 (6374504) into main (acd0745) will decrease coverage by
0.01%
. The diff coverage is90.80%
.
:exclamation: Current head 6374504 differs from pull request most recent head 7be69e8. Consider uploading reports for the commit 7be69e8 to get more accurate results
@@ Coverage Diff @@
## main #2182 +/- ##
==========================================
- Coverage 94.03% 94.01% -0.02%
==========================================
Files 289 289
Lines 44564 44612 +48
==========================================
+ Hits 41904 41944 +40
- Misses 2660 2668 +8
Flag | Coverage Δ | |
---|---|---|
behaviourtests | 4.74% <0.00%> (-0.01%) |
:arrow_down: |
unittests | 94.68% <90.80%> (-0.02%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
satpy/readers/netcdf_utils.py | 95.42% <75.86%> (-4.58%) |
:arrow_down: |
satpy/tests/reader_tests/test_fci_l1c_nc.py | 99.70% <97.14%> (-0.30%) |
:arrow_down: |
satpy/readers/fci_l1c_nc.py | 98.24% <100.00%> (+0.08%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Unless there is a public place with FCI data served over S3, I can't test the remote reading functionality. However, for local processing, there is a severe regression in performance. Generating three channels and one composite from compressed full disc on my workstation, without resampling: without this PR, 38.7 seconds. With this PR, 1 minutes 57 seconds. That's roughly 3× longer. This PR should make sure that there is no regression in the performance for local processing, which will remain the most common use case.
At the same time, RAM usage is down from 4.3 GB to 2.9 GB, which is great, but in this case I believe wall clock time will worry users more.
Satpy main:
Command being timed: "python /home/gholl/checkouts/protocode/fci-true-color.py"
User time (seconds): 88.84
System time (seconds): 3.90
Percent of CPU this job got: 239%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:38.67
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 4329920
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 616257
Voluntary context switches: 16582
Involuntary context switches: 25305
Swaps: 0
File system inputs: 245656
File system outputs: 109240
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
With this PR:
Command being timed: "python /home/gholl/checkouts/protocode/fci-true-color.py"
User time (seconds): 169.68
System time (seconds): 2.80
Percent of CPU this job got: 147%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:57.20
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2893340
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 366179
Voluntary context switches: 21684
Involuntary context switches: 19337
Swaps: 0
File system inputs: 246760
File system outputs: 109240
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Coverage decreased (-0.01%) to 94.626% when pulling 6374504a854177c16a22c592cc33212ed330da38 on pnuu:feature-fci-fsfile into acd074530f1eeaa860c1d18de22ecbfd75be8166 on pytroll:main.
Thanks for testing! I noticed the same increase in processing time for local files, but didn't yet have time to look into it. My guess is some da.array()
calls or such are causing unnecessary computes.
FYI just to put this somewhere, I found this: https://github.com/cedadev/S3-netcdf-python
And...https://docs.unidata.ucar.edu/nug/current/nczarr_head.html