pyresample icon indicating copy to clipboard operation
pyresample copied to clipboard

Use memoryviews and allow nogil in gradient search

Open mraspaud opened this issue 2 years ago • 3 comments

This PR uses moryviews and allows nogil in gradient search

  • [ ] Closes #445

mraspaud avatar Sep 21 '22 10:09 mraspaud

Codecov Report

Merging #455 (8ba117b) into main (8da4081) will not change coverage. The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #455   +/-   ##
=======================================
  Coverage   94.28%   94.28%           
=======================================
  Files          69       69           
  Lines       12388    12388           
=======================================
  Hits        11680    11680           
  Misses        708      708           
Flag Coverage Δ
unittests 94.28% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Sep 21 '22 12:09 codecov[bot]

Coverage Status

Coverage remained the same at 94.13% when pulling 8ba117b73cb19e3b0dea6fcbe1fa13e71fd3e8ed on mraspaud:feature-nogil-gradient into 8da40816e6b38a12efc5c082298938f6636f846e on pytroll:main.

coveralls avatar Sep 21 '22 12:09 coveralls

Do you have a plot of the differences? Dask diagnostics plot if you can

djhoese avatar Sep 21 '22 17:09 djhoese

TBH, I can't see any difference. My guess is that somehow the gil was already released... before the change (so not releasing the gil?) withgil after the change (releasing the gil?) nogil

mraspaud avatar Sep 26 '22 13:09 mraspaud

It might be easier to see if you do multiple datasets at the same time? Like running a satpy Scene with multiple bands loaded and all being resampled at the same time. Actually yeah, if the nogil sections you have now are all run serially for the most part then dask may be making it look like high CPU usage. You could also try passing 0.2 to your ResourceProfiler to get more data points on the CPU/memory graph since it looks like it is only every half second and your total processing time isn't that long.

...or you forgot to recompile between tests.

djhoese avatar Sep 26 '22 14:09 djhoese

I did recompile! Checking now with other settings

mraspaud avatar Sep 26 '22 15:09 mraspaud

Tried with 3 composites, a bigger destination area, the setting was already to dt=0.25. The results are marginally better without gil. before this PR withgil after this PR nogil

Next, I'm going to try with bigger data (was testing with seviri 3km0

mraspaud avatar Sep 26 '22 15:09 mraspaud

Yeah that CPU line definitely looks better. The amount that it drops down still makes me think the chunk size could be larger (if the files are a good size for it).

djhoese avatar Sep 26 '22 15:09 djhoese

chunk size is 1024 in the previous examples

mraspaud avatar Sep 26 '22 15:09 mraspaud

I guess the flat 100 % in the beginning is the Satpy file handler creation we've already discussed in https://github.com/pytroll/satpy/issues/2186 ?

pnuu avatar Sep 26 '22 15:09 pnuu

I guess the flat 100 % in the beginning is the Satpy file handler creation we've already discussed in pytroll/satpy#2186 ?

yes, that looks like it. Far to long imo, we should look a that.

mraspaud avatar Sep 27 '22 06:09 mraspaud

@mraspaud can you give more details on what your code and target AreaDefinition looked like for your last comment's plots? I'm wondering if we can come up with a case that shows a little more improvement.

djhoese avatar Oct 28 '22 18:10 djhoese

@djhoese that was seviri data full disk resampled onto a full earth mollweide:

moll:
  description: moll
  projection:
    ellps: WGS84
    lon_0: 0.0
    proj: moll
    lat_0: 0.0
  shape:
    height: 4500
    width: 9000
  area_extent:
    lower_left_xy: [-18040095.696147293, -9020047.848073646]
    upper_right_xy: [18040095.696147293, 9020047.848073646]
    units: m

then I tried with fci but got my computer to crash..

mraspaud avatar Oct 31 '22 09:10 mraspaud

Trying it with ABI to a latlong projection (not the full data region). Before this PR:

image

After:

image

So I see better CPU usage and a faster run time by almost 30 seconds. This was generating a fully corrected true_color.

Note: This is using my "pass through" profiling where I just compute the chunks and throw them away (no saving to disk).

djhoese avatar Oct 31 '22 14:10 djhoese

And for the record, here is the same thing with nearest:

image

But I also see "PerformanceWarning"s in the kd_tree.py module. But it took 3x as much memory and took longer than the before this PR case of gradient_search.

djhoese avatar Oct 31 '22 15:10 djhoese

Pretty:

image

djhoese avatar Oct 31 '22 15:10 djhoese

Thanks for trying it out @djhoese ! Should we merge?

mraspaud avatar Nov 07 '22 14:11 mraspaud