pysteps
pysteps copied to clipboard
Update rainfarm.py
Add smoothing operator
see #287
Codecov Report
Attention: Patch coverage is 95.80420% with 6 lines in your changes missing coverage. Please review.
Project coverage is 83.50%. Comparing base (
8ece5be) to head (c38f4e1). Report is 45 commits behind head on master.
:exclamation: Current head c38f4e1 differs from pull request most recent head 8827515
Please upload reports for the commit 8827515 to get more accurate results.
| Files | Patch % | Lines |
|---|---|---|
| pysteps/downscaling/rainfarm.py | 95.20% | 6 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #311 +/- ##
==========================================
+ Coverage 82.79% 83.50% +0.71%
==========================================
Files 161 159 -2
Lines 12339 12562 +223
==========================================
+ Hits 10216 10490 +274
+ Misses 2123 2072 -51
| Flag | Coverage Δ | |
|---|---|---|
| unit_tests | 83.50% <95.80%> (+0.71%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@simonbeylat as I said, it is great that you are adding this.
More importantly, I was wondering about the difference between the smoothing operator from @jhardenberg (which you nicely reimplemented in this PR) and the
_balanced_spatial_averagealready used in the code. Aren't they doing the same thing? I wouldn't mind replacing the existing method with a more sophisticated one, we just need to make sure that we are not applying the same thing twice.
Indeed _balanced_spatial_average is a generic function which does convolution with a kernel (I also was going to suggest some pre-made convolution function from scipy or numpy to this end). So to implement Gaussian smoothing, it would be enough to define a Gaussian kernel instead of the tophat which is currently being used in the code and then call _balanced_spatial_average. What is different in your new _smoothconv function is also the treatment of missing values, which is indeed completely missing in _balanced_spatial_average (and which could in theory be added to this function).
Both a tophat and a Gaussian are valid options and it would be fine if the user could select which one to use. In the original Rainfarm we usually also leave the option of no smoothing at all, just making sure that the aggregattion over boxes corresponding to the coarse resolution is conserved, but that is another issue.
Btw I notice that in your code if the smooth option is selected you call both the tophat smoothing and the Gaussian smoothing in sequence. That is a double smoothing, If we use the Gaussian smoothing we do not need to smooth before also with the tophat.
Another comment: in other languages we always implemented smoothing using the equivalent of the scipy.ndimage.convolve option mode='wrap' to treat the boundaries because we implemented it dierctly in Fourier space. There is nothing wrong with the default 'reflect' used in _balanced_spatial_average but obviously it will lead to different results. The new implementation of _smoothconv implictly is using 'wrap' since it operates in Fourier space.
To be clear, my suggestion would indeed be to expand _balanced_spatial_average to treat also missing values and then call it either with a tophat or with a Gaussian kernel as needed. This is much more compact code. No need to have to functions, one should be enough.
A final comment regards speed: as far as I understand it scipy.ndimage.convolve which is used in _balanced_spatial_average does 'regular' convolution with matrix multiplication. The same done in Fourier space (as we do in other languase and as you are now doing in your new function) can be much more efficient. Actually scipy.signal.convolve as of version 0.19.0 chooses automatically the fastest option for you, so it might be better to use that instad of scipy.ndimage.convolve in _balanced_spatial_average.
@simonbeylat have you tried comparing the speed of your new function to that of the current _balanced_spatial_average ? And with that of a _balanced_spatial_average using scipy.signal.convolve ? (should be similar)
Hello
First of all, thank you for your feedback!
@dnerini , I also like the idea of having a clear and aesthetic code that could be easily understood by others. I will make some changes by looking at the different documentations you sent me! I will also use the test code before I add the change.
Both a tophat and a Gaussian are valid options and it would be fine if the user could select which one to use. In the original Rainfarm we usually also leave the option of no smoothing at all, just making sure that the aggregattion over boxes corresponding to the coarse resolution is conserved, but that is another issue.
I like @jhardenberg idea where we let the users choose which technique they want to use. I think I can try to "update" the _balanced_spatial_average method to add the treatment of missing values.
So I suggest changing the smooth argument and having 3 options:
None: no smooth operatore.'tophat'(default): use the_balanced_spatial_averagemethod.'Gaussian': use_smoothconvor_balanced_spatial_averagewith a Gaussian kernel.
About testing the code and the speed of time. I tested the code using several data (I was using those data when I was using the R version of RainFARM). I didn't see a noticeable improvement but that may be because I was using _balanced_spatial_average and _smoothconv in a row. In terms of speed, both methods were quite fast. I didn't focus on that point.
(out of context, I wanted to introduce myself quickly (as I have never done it before) I have just graduated in a master of data science/engineering. (which ended with an internship in downscaling and bais correction of climate models), and I am actively looking for a PhD in these (or similar) subjects. This development is also a way for me to learn and improve my skills. I wanted to thank you for the help and the feed back you are giving me.)
Excellent, sounds all very good to me! I'll have another closer look once you push the next changes ;-)
And thanks for introducing yourself, always a pleasure to learn more about the person behind a simple name!
Thanks @simonbeylat for the new changes, well done!
I noticed that you still have both smoothing methods implemented: _smoothconv and _balanced_spatial_average. From what I gathered in our discussion above with @jhardenberg, we would rather join them into one single method which can accept any arbitrary kernel. In practice, we suggested that you start with the existing _balanced_spatial_average and expand it with the treatment of missing values (as in the new function). Then, based on a input argument kernel, you could build and pass the corresponding kernel function or just skip the smoothing part kernel is set to None.
Does it make sense?
Hi @dnerini, @jhardenberg
I managed to add the 2nd pull request which is about the part added by D'onofrio et al in 2014 on spectral fusion. It took me a while but I found something that looks good.
I look forward to your reviews.
hey @simonbeylat just wanted to quickly mention that I haven't forgot about this, it's just a bit of a busy period. I hope to be able to look into your latest changes soon!
Hello @dnerini ,
Will you have the time to review the new commit ?
Or should I add the last commit (with the computation of weight) ?
hey @simonbeylat I just created a PR on your branch: https://github.com/simonbeylat/pysteps/pull/1 let's continue discussing there for now, we'll come back here once those changes are discussed and merged.
I had the changes you made and I had small changes because it wasn't working.
I was implementing the _apply_spectral_fusion function, I tried to translate @jhardenberg 's julia version.
But I don't really understand how this version and the version written in D'onofrio's article are equivalent. Also, when I tried the code, it seems to be wrong and gives very large extremes.
The first version I wrote (you should be able to see it by looking at this commit )seemed to give good results and I tried to close to the mathematics written in the article, but I might be wrong.
I think I misunderstood something and translated @jhardenberg's code wrong. I'll focus on this part.
Hello @simonbeylat thanks for the update, happy to see that we are making some progress on this!
Concerning the _apply_spectral_fusion, it would be perfectly fine for me to follow the maths from the original paper if these give more sensible results. We might then ask for help to @jhardenberg to reimplement his julia/R versions in python in a second step.
hey @simonbeylat, getting back on this PR I realized I might have misunderstood your last message -- are you still working on it or is it in principle complete?
Hello, sorry I have several important projects and a big problem with my personal computer where I was working on this project... (as I was doing this in my spare time). I will work on this project as soon as I can !
Hey @simonbeylat , absolutely, no worries! Just wanted to make sure that you weren't waiting for a feedback from my side ;-)
Hi @dnerini, I'm very busy as I started my PhD and so I started many projects, I'm not sure if I can finish the development of rainfarm at the moment. I will try to do it when I have some free time, but I can't know when.
I'm sorry for that.
Hello @dnerini, @jhardenberg and @simonbeylat,
Firstly, thank you very much for providing this tool. We began working with the RainFARM implementation in PySTEPS, and upon discovering this pull request, we decided to try the spectral fusion update by Simon, as it represents an improvement in merging between coarse-scale precipitation fields and fine synthetic precipitation fields. The code we used is in this commit.
The appearance of the field seems to be more realistic compared to the current RainFARM implementation in PySTEPS. However, we have encountered some unrealistic precipitation values at certain grid boundaries. Attached is an example comparing the different RainFARM implementations: From left to right, Spectral fusion in Python using Simon's code (note the unrealistic precipitation at the boundary on the right), the original RainFARM in PySTEPS, and R's implementation of RainFARM.
We checked Simon's code, but so far, we haven’t identified anything that could lead to this problem. During the development of the code, did you encounter similar results or problems? We consulted resources online and found suggestions that this issue may be related to spectral leakage. However, we are not experts in this field. One proposed solution is to apply windowing methodologies before calculating the Fast Fourier Transform, although we have found that this does not always resolve the problem. We are not sure if there are additional insights or solutions we may be overlooking.
Thank you once again.
Hello @ecasellas,
I'd like to start by apologising for the lack of progress, I started updating RainFARM in my free time during my master's degree, but when I started my phd (on data assimilation), I didn't have time to get back to it.
Actually the implementation was not completely finished, the initial plan was to come back with the same version of RainFARM as the Terzago et al. 2018 paper. but I'm quite surprised by the results (even with the boundary issue). Unfortunately I can't manage to finish this job (although I would have loved to)..
I'll be very happy to follow developments if you decide to update the code I've added.
Best,
Simon
Dear @ecasellas many thanks for following up on this! This PR is currently stale, which is a bit of a pity since it's probably not very far from being completed. Since the original author is not able to work on it anymore, I suppose this is now open to anyone motivated to continue and finalize the work!
Concerning the actual issue reported above, the windowing approach that you mention makes sense to me. Another thing worth trying might be padding the field with zeros to mitigate such boundary effects, have you already tried something in this direction?
edit: just out of curiosity, could you add a plot of the original low res field that you downscaled in your example?
Dear @simonbeylat and @dnerini,
Thank you very much for your comments. We're attaching a new figure with the original low-resolution field. We've tried the downscaling procedure with different events, and the problem seems to persist, especially at the right and bottom boundaries, as you can see in the new example.
As you suggested, we tried padding the field with zeros (we applied it before the FFT calculation, we hope we did it in the correct place), but it didn’t improve. Perhaps we didn’t apply it correctly. We have already experimented with some windowing approaches, but sometimes they work (completely removing the boundary effects), and sometimes they yield very implausible results, appearing very granular.
We'll take another look into the code and return here as soon as we have made progress.
Just to introduce ourselves a little bit, I'm working on this together with Blai Domingo, who is doing an internship at the Meteorological Service of Catalonia, and we'd like to continue with this pull request, although we're not sure if we're going to succeed.
Thank you once again.
Dear @ecasellas,
Thank you for the updates, it would be great if you could continue!
Here are the problems linked to these pr :https://github.com/pySTEPS/pysteps/issues/287. If you want more context, I think there's the initial plan. and I apologise again for not having completed the code correctly.
Best,
Simon
Dear @dnerini,
Together with @blaiDoAr, we made some progress and believe we now have a Python version equivalent to R's RainFARM implementation. The rare bands previously reported seem to have disappeared, and the results appear to match those obtained directly with R, as you can see in the figure below. (from left to right: original low res field, PySTEPS implementation, RainFARM R's implmentation with smoothing set to None, and Simon's implementation with changes and without smoothing)
Our starting point was the last commit of this PR, from December 6th, and we made some changes using a translation to Python from some of R's implementation. The version we implemented is not completely adapted to the current version of PySTEPS, since, if we didn’t misunderstand R's implementation, the latter works with wavenumbers and PySTEPS with frequencies. Perhaps we should make some changes to unify both versions and attempt to modify the least of the current PySTEPS implementation.
Should we commit the current changes to this PR as a starting point for the possible new version, or should we proceed differently?
Thank you
Hi @ecasellas and @blaiDoAr Exciting news and congratulations on the great achievement!
So you can either decide to commit your changes to this PR, which is a fork that belongs to @simonbeylat (not sure you'll have the right permissions though, unless you already got invited to the associated repo), or decide to "fork the fork" so to have full control on the new version. In this scenario, you'll have to open a new PR to pysteps and we'll close this one.
Does this help? I look forward to include your changes (including the excellent work from @simonbeylat ) into the main repo! Let me know if you have any other question.
Hi @ecasellas and @blaiDoAr
I've invited you on the fork, and I'll give you all the rights. I think that keeping the PR can help keep all the information in one place (and therefore make it easier for others to follow). But it's up to you to decide what you want to do to continue the work. Very good Job, Thank to you two !
Best,
Simon
Hi @dnerini and @simonbeylat,
Thank you very much for your comments. Since @simonbeylat granted us push permissions to the associated repository, we'll commit the changes to this PR. As Simon mentioned, we also think it's better to keep everything in the same place and to maintain the history of changes and all the great work done by Simon.
We'll return here once the changes are pushed.
Thank you once again.
Hello @dnerini and @simonbeylat,
We've made some updates to this PR, which include:
- Implementation of spectral fusion, translated from R's RainFARM version. Initially, we used a custom function for radially averaged spectra in Python, but later discovered that the
raspdfunction was already implemented in PySTEPS, as @simonbeylat had done. - Adjustment of the consistency condition between the synthetic field and the original field. In the previous version, the original field was zoomed into the synthetic field shape before applying the consistency condition. Here, we aggregate the synthetic field to the original shape, interpolate it to match the synthetic field shape (using
np.kron), and finally apply the consistency condition. - Added an option to skip smoothing. When aggregating a high-resolution field, it matches the original field if no smoothing kernels (already implemented by @simonbeylat) are applied. This is not the case if any of the available smoothing kernels are used.
- If
spectral_fusionset toTrue, the noise field is normalized using the shape of the field rather than its standard deviation. This is the same as in R, we tried using standard deviation, but then the results differ. - The
alphaused in generating the synthetic field is divided by 2, as in the referenced articles. However, in R and Julia implementations, it’s calculated as(alpha + 1) / 4. Dividingalphaby 2 yields results consistent with R, although we're uncertain why this discrepancy exists. - Tests for precipitation high-resolution aggregation and spectral fusion. The latter was integrated into the existing
test_rainfarm_shapefunction, with additional parameters included.
We appreciate the excellent work done by @simonbeylat and @dnerini, as it’s much easier to approach a problem when it’s already laid out.
We look forward to your reviews, thank you once again.
Hi folks, this is a great development, congrats to everybody!
Hello,
We tried this new version on several episodes and found that something might not be entirely correct in the changes we made. Upon reviewing the code, we realized that we misunderstood the calculation of the synthetic field amplitude. In the papers, alpha is divided by 2, and in our previous implementation, we used:
noise_field_complex = white_noise_field_complex * np.sqrt( freq_array_highres**-alpha / 2 )
However, the factor of 1/2 is already accounted for by np.sqrt. So adding 1/2 was resulting in a double np.sqrt. In the new commit, we removed the "/ 2". We then checked other precipitation events and compared the performance of Python RainFARM and R RainFARM, and the results were equivalent.
P.S. Sorry @jhardenberg for not mentioning you in the previous message, but we also thank you very much for providing RainFARM in R and Julia.
hi @ecasellas, thanks for the important fix! From my perspective, this looks good to go. I wonder if @simonbeylat and @jhardenberg are planning to review this PR?
edit: @ecasellas the formatting test is failing, can you run black on your code and push the changes to this branch?
Hello @dnerini, black formatter run on rainfarm.py, hope it'll pass the test!
hello @ecasellas apologies for the slow action on this PR... I had a closer look today and pushed mostly cosmetic changes to make the code (hopefully) more readable. I also left one question above if you could look at it? once resolved that conversation I think we'll be ready to merge