pysteps
pysteps copied to clipboard
Refactor the blending/steps.py nowcasting code
Start with low-hanging fruit:
- clearer variable names (compare to nowcasts/steps.py refactoring)
- potentially make the monolithic function more modular.
Yes please!
@RubenImhoff I went trough the whole code to see what was unclear for me. I will already paste my findings here and maybe we can discuss later today. Ill first time some time to eat :p
Specific lines of needed name changes
Line 583: R_f should get a new name
Line 600: R_f_ empty_precip?
Line 616: outarr
Line 677: pp, generate_noise, noise_std_coeffs = _init_noise(
Line 716: randgen_prec, vps, generate_vel_noise = _init_random_generators(
Line 727: D, D_Yn, D_pb, R_f, R_m, mask_rim, struct, fft_objs = _prepare_forecast_loop(
Line 784: forecast_prev = precip_cascade noise_prev = noise_cascade t_prev = [0.0 for j in range(n_ens_members)] t_total = [0.0 for j in range(n_ens_members)]
Line 790: This timestep function is extremly complicated to understand, maybe changing some names would make this more clear
Line 858 rho_extr would change to rho_extrap (just a bit more clear what this is)
Line 865 R_f_ should be renamed: R_f_temp
Line 922: cov -> covariance_radar_model?
Line 955: EPS should be changed, quite unclear what it is
Line 997: EPS_ should be changed
Line 1028: R_f_ip
Line 1033: Yn_ip
Line 1051: t_diff_prev
Line 1065: V_stack
Line 1073: V_model_
Line 1094: R_f_im_recomp
Line 1111: R_f_ep_recomp_
Line 1118: R_f_ep_recomp
Line 1119: temp_mask
Line 1122: R_f_ep
Line 1144: Yn_ip_recomp
Line 1158: Yn_ep_recomp
Line 1159: Yn_ep
Line 1173: R_f_ep_out, Yn_ep_out, R_pm_ep
Line 1285: R_f_out
Line 1339: cascades_stacked_
Line 1356: R_f_blended
Line 1361: R_f_blended_mod_only
Line 1384: R_f_new
Line 1391: R_f_new_mod_only
Line 1412: weights_pm, weights_pm_normalized, weights_pm_mod_only,
Line 1438 Better commenting?
Line 1439: R_pm_blended
Line 1447: R_pm_blended_mod_only
Line 1490: Can be removed
line 1517: R_cmin
Line 1525: MASK_prec -> MASK_precip or MASK_precip_cell?
Line 1550: mu_0, MASK, mu_fct
Line 1578: R_f_stacked
General: All fc should be changed to forecast?
We should look at all these t_... variables, they are quite strange
Do we go for mu and sigma or means and sigmas? Should be uniform choice
R_f = precip_forecast? is this correct?
For all of these, I could not find the names in the nowcast steps but it is really hard to see the variables through the forest :) . I also tried to do this matching by using some generative AI tools but it does not preform well on this sort of tasks
To give a brief answer (already):
General: All fc should be changed to forecast? --> Yes!
We should look at all these t_... variables, they are quite strange --> Fair, if doable :)
Do we go for mu and sigma or means and sigmas? Should be uniform choice --> I would go for means and sigmas then :)
R_f = precip_forecast? is this correct? --> Correct
short comment: when there are so many variables given and returned by functions, it's usually a good indication that you need a class.
so why not refactoring the code into a class and so that the methods can simply access attributes instead of having to pass them around all the time? the main interface can stay the same, that is, forecast()
I agree this is the most elegant solution but maybe we should tackle that problem after name changes and changing the monolith into something more readable?
My work today so the refactoring can be done in the future; things like splitting up the code into more modular functions or making a class can be added after the current steps I list below:
STEP 1: All "R_f" should be changed to "precip_forecast". Use CTRL+F to find all relevant changes in both code and comments STEP 2: Specific names need to be changed; due to step 1, some of them have been changed from the original. Below I list all changed needed and some proposed names (can definitely be changed). The check with CTRL F is added because refactoring in one place does not always guarantee the name is changed in every function.
| Line number | Old name | Name after step 1 | Proposed name | Check with Ctrl F |
|---|---|---|---|---|
| 600 | R_f_ | precip_forecast_ | precip_forecast | No |
| 616 | outarr | precip_forecast_all_members_all_times | No | |
| 677 | pp | generate_perturb | Yes | |
| 716 | randgen_prec | randgen_precip | Yes | |
| 716 | vps | velocity_perturbations | Yes | |
| 727, 2321 | R_m | precip_forecast_non_perturbed | ||
| 727, 2303 | D | previous_displacement | No | |
| 727, 2304 | D_Yn | previous_displacement_noise_cascade | No | |
| 727, 2305 | D_pb | previous_displacement_prob_matching | No | |
| 761, 859 | rho_extr_prev | rho_extrap_cascade_prev | No | |
| 762, 882, 892 | rho_extr | rho_extrap_cascade | No | |
| 784 | forecast_prev | precip_forc_prev_subtimestep | No | |
| 785 | noise_prev | noise_prev_subtimestep | No | |
| 786 | t_prev | t_prev_timestep | No | |
| 787 | t_total | t_leadtime_since_start_forecast | No | |
| 922 | cov | covariance_nwp_models | No | |
| 955, 2387 | EPS | epsilon_decomposed | No | |
| 955, 964, 2387, 2391 | epsilon_decomposed | epsilon_ | No | |
| 977, 2403 | EPS_ | epsilon_temp | No | |
| 1026 | t_diff_prev_int | t_diff_prev_subtimestep_int | No | |
| 1028 | R_f_ip | precip_forecast_ip | precip_forecast_cascade_subtimestep | No |
| 1033 | Yn_ip | noise_cascade_subtimestep | No | |
| 1051 | t_diff_prev | t_diff_prev_subtimestep | No | |
| 1059 | velocity_pert | velocity_perturbations_extrapolation | No | |
| 1065 | V_stack | velocity_stack_all | No | |
| 1073 | V_model_ | velocity_models | No | |
| 1094 | R_f_ip_recomp | precip_forecast_ip_recomp | precip_forecast_recomp_subtimestep | No |
| 1111 | R_f_ep_recomp_ | precip_forecast_ep_recomp_ | precip_forecast_extrapolated_recomp_subtimestep_temp | No |
| 1118 | R_f_ep_recomp | precip_forecast_ep_recomp | precip_forecast_extrapolated_recomp_subtimestep | No |
| 1119 | temp_mask | ??? | ||
| 1122 | R_f_ep | precip_forecast_ep | precip_forecast_extrapolated_decomp | No |
| 1144 | Yn_ip_recomp | noise_cascade_recomp | No | |
| 1151 | Yn_ep_recomp_ | noise_extrapolated_recomp_temp | No | |
| 1158 | Yn_ep_recomp | noise_extrapolated_recomp | No | |
| 1159 | Yn_ep | noise_extrapolated_decomp | No | |
| 1173 | R_f_ep_out | precip_forecast_ep_out | precip_forecast_extrapolated_decomp_done | No |
| 1147 | Yn_ep_out | noise_extrapolated_decomp_done | No | |
| 1192 | R_ | precip_forecast | No | |
| 1194 | R_pm_ep | precip_forecast_extrapolated_probability_matching_temp | No | |
| 1201 | R_pm_ep_ | precip_forecast_extrapolated_probability_matching | No | |
| 1285 | R_f_out | precip_forecast_out | final_blended_forecast | No |
| 1294 | cascades_stacked | cascade_stack_all_components | No | |
| 1339 | cascades_stacked_ | cascade_stack_all_components_temp | No | |
| 1347 | covariance_nwp_models | covariance_all_components | No | |
| 1356 | R_f_blended | precip_forecast_blended | precip_forecast_blended | No |
| 1361 | R_f_blended_mod_only | precip_forecast_blended_mod_only | precip_forecast_blended_mod_only | No |
| 1384 | R_f_new | precip_forecast_new | precip_forecast_recomposed | No |
| 1391 | R_f_new_mod_only | precip_forecast_new_mod_only | precip_forecast_recomposed_mod_only | No |
| 1412 | weights_pm | weights_probability_matching | No | |
| 1413 | weights_pm_normalized | weights_probability_matching_normalized | No | |
| 1415 | weights_pm_mod_only | weights_probability_matching_mod_only | No | |
| 1438 | weights_pm_normalized_mod_only | weights_probability_matching_normalized_mod_only | No | |
| 1439 | R_pm_blended | precip_forecast_probability_matching_blended | No | |
| 1447 | R_pm_blended_mod_only | precip_forecast_probability_matching_blended_mod_only | No | |
| 1517 | R_cmin | precip_forecast_min_value | No | |
| 1525 | MASK_prec | precip_field_mask | No | |
| 1550 | Mu_0 | mean_probabiltity_matching_forecast | No | |
| 1551 | MASK | no_rain_mask | No | |
| 1552 | mu_fct | mean_precip_forecast | No | |
| 1578 | R_f_stacked | precip_forecast_stacked | precip_forecast_final | No |
| 1955 | R_min | precip_forecast_min | No | |
| 1998 | R_d | precip_forecast_decomp | No | |
| 2000 | R_ | precip_forecast | No | |
| 2014 | R_c | precip_forecast_cascades | No | |
| 2098 | R_c | precip_forecast_cascades | No | |
| 2165 | R_models_pm | precip_forecast_probability_matching | No | |
| 2291 | R_c | precip_forecast_cascades | No | |
| 2321 | R_m | ???? | ??? | |
| 2333 | R_c | precip_forecast_cascades | No |
STEP 3: Splitting up the code into separate functions (I might add some suggestions here tomorrow)
STEP 4: Changing to a class will make keeping track of all these variables less necessary.
Please provide me with some feedback.
I just went trough the code to try and refactor it. A version This is just a proof of concept is now pushed to try and show my suggestions (however with the old names as I did not push my version with name changes yesterday...) Below I will post the suggestions I applied to the code to have a clear visual what to do once the master branch can be refactored.
Refactoring Plan
| Part in Code | Suggested Name | Extra Info/Recommendations |
|---|---|---|
| Everything from start till just before #1 | preprocessing |
Check if all variables are still found in function: zerovalue |
| Everything from #2 to just before #2.3 | compute_cascades |
Check if all variables are still found for the function: mu_extrapolation, sigma_extrapolation |
After 2.3.1 in if zero_precip_radar and zero_model_fields: |
no_rain_radar_and_nwp |
|
| Bring #3 | initialisze_noise_methods |
|
| Bring #6 into one function except for last line | initiate_all_random_generators |
Check if all variables are still found: randgen_prec, vps, generate_vel_noise, mu_noise, sigma_noise, D, D_Yn, D_pb, R_f, R_m, mask_rim, struct, fft_objs |
| Everything in worker should be first extracted to a worker function | - | Not a nested definition but a separate function |
| Then worker should be split up further | - | |
| Bring 8.1.2 | determine_skill_nwp_at_leadtime |
|
| Bring 8.2 | calculate_weights_per_component |
|
| Bring 8.3.1 | calculcate_epsilon_decomposed |
|
| 8.3.2 | iterate_ar_extrapolation_per_cascade |
|
| 8.3.3 | iterate_ar_noise_per_cascade |
|
| For loop in 8.4 | extrapolate_per_timestep |
|
| Following for loop just before 8.5 | advect_forecast_one_timestep_no_subtimesteps |
|
In 8.5 if blend_nwp_members: else ... |
concatinate_cascade_mean_and_sigmas |
|
if weights_method == "spn" |
weights_method_spn |
|
| Before 8.6 | blend_all_cadcades |
|
| 8.6 | recompose_cascades_to_precip_field |
|
| 8.7.1 | calculate_probability_matching_member |
|
| 8.7.1 | calculate_forecast_probabilistic_member |
|
| 8.7.2 | apply_probability_matching |
@RubenImhoff @dnerini would now be a good time to refactor the code in steps blending and nowcasting? I have talked to @mats-knmi about how this would be integrated in the x-array version. This would be done after the changes are done in the current versions and once all tests are passed.
sounds very good!
@dnerini just to confirm if this would be okay. I was planning to refactor this code to a class due to all the variables. Does this conform to the pysteps design philosophy or should I just keep all the variables as local variables and remove the class based approach?
@sidekock Go ahead with the refactoring. Yes, true, at the beginning we explicitly decided to avoid oop, but i believe the limits and shortcomings of that decision are quite clear.. so let's give it a go!
@sidekock, sorry I was on vacation, so therefore my slow response. Great if you want to make a start with this! :)
@sidekock, since I am planning on starting with the blending conversion to xarray as well soon, I would appreciate it if you could create a draft pull request from the branch you are currently working on and add me as (one of) the reviewers. This way I can easily keep track of your changes and take this into account when working on the xarray stuff.
@mats-knmi Will do! I will probably only start somewhere later this week. The teaching season is once again upon me so first need to make sure I am ready on that front :)
@mats-knmi I will start this afternoon with the re-factoring. Will start with the nowcast steps to have a template and then look at the blending steps. Ill add you as a reviewer as soon as the nowcast steps is refactored @RubenImhoff can I add you too?
So I started today but I have a general question regarding the _check_inputs function. This is used to check some of the inputs put others are then checked in the forecast loop itself. Would it not be best to add all these checks and preparatory assignments (eg None -> dict()) in this _check_inputs?
@mats-knmi I will start this afternoon with the re-factoring. Will start with the nowcast steps to have a template and then look at the blending steps. Ill add you as a reviewer as soon as the nowcast steps is refactored @RubenImhoff can I add you too?
Of course! :)
So I started today but I have a general question regarding the _check_inputs function. This is used to check some of the inputs put others are then checked in the forecast loop itself. Would it not be best to add all these checks and preparatory assignments (eg None -> dict()) in this _check_inputs?
That may be a better way indeed.