[13pt] Abnormal long runtimes for some Alaska HUCs
During a full BED for fim_4_4_15_0, then entire run time jumped dramatically. This is the first run to include Alaska.
Here are the 20 longest runtimes for the 1,900 plus HUCs. humm.. did we lose some HUCs as that sounds wrong, it should be at least 2,180. I will look into that separately.
The list below, the three columns are: HUC number, time in hh:mm, time as a percent
10110201 60:51 60.85 10200101 60:51 60.85 19020101 60:56 60.93 19020201 63:20 63.33 10300102 64:06 64.1 19020103 66:16 66.26 19020800 68:12 68.2 19020302 70:36 70.6 19020302 71:02 71.3 16060008 73:00 73 19020502 79:35 79.58 19020202 84:07 84.11 19020501 89:30 89.5 19020505 89:47 89.78 18100100 90:32 90.53 19020503 99:35 99.58 19020102 133:07 133.11 19020601 243:10 243.16 19020402 282:06 282.1 19020504 290:04 290.6 19020602 462:52 462.86 19020104 642:57 642.95
The HUC prior to 19020102 seem reasonable, but the others are suspicious, especially that last one which is 10.7 hours.
These are times based on using the AWS Step system, which the fargate machines are set at 6 cores (of 8).
We are going to do more tests on those HUCs on Prod which we will use 42 / 48 (7 x difference to see what we get.
Rob, I would love to be able to bring this to leadership's attention. Could you maybe make a comparison of the CONUS HUC processing times to the Alaska HUCs? I'm thinking a boxplot like the one below would really convey our message. If you're too busy to play around with the plotting, could you provide me with the timing data?
Stream line and stream density of HUC 19020104 (the 10-hour HUC) and surrounding HUCs.
Rob, I would love to be able to bring this to leadership's attention. Could you maybe make a comparison of the CONUS HUC processing times to the Alaska HUCs? I'm thinking a boxplot like the one below would really convey our message. If you're too busy to play around with the plotting, could you provide me with the timing data?
If you if into the EFS outputs/fim_4_4_15_0/logs/unit, there is a summary file of runtime for all units. That csv has three columns, huc, runtime in datetime, runtime in time as a percent.
I want to do some experiments on them using just our prod machine. When we run them on our fargates (big aws runs), those machines are small. About the same size as our regular EC2 with only 8 cores and 64GB ram. Maybe those just need more horsepower. I will see what I can test over the weekend and on Monday to see what we can learn.
I want to do some experiments on them using just our prod machine. When we run them on our fargates (big aws runs), those machines are small. About the same size as our regular EC2 with only 8 cores and 64GB ram. Maybe those just need more horsepower. I will see what I can test over the weekend and on Monday to see what we can learn.
Stranger yet. The Alaska huc list I ran on Prod was even slower generally speaking (surprisingly). But the trends were the same.
And... the BED failed in post processing, but my test alaska set did not fail in post processing ??? See EFS / outputs.
Here's the boxplot for the Alaska HUCs vs CONUS+
Looked into number of branches to time ratio: Pre Alaska: Most branches: Top 5 in order (huc, number of branches) '21010005': 23.00 : 118, '09030001': 40.56 : 116, '10300102': 64.10 : 115, '03130003': 35.30 : 111, '10140201': 46.66 : 106, Average time is 22 min. (not counting Alaska)
Alaska number of branches
19020203 33.61 10 19020302 45.46 54 19020301 48.10 39 19020101 60.93 85 19020201 63.33 54 19020103 66.26 94 19020800 68.20 4 19020502 79.58 70 19020202 84.11 66 19020501 89.50 98 19020505 89.78 91 19020503 99.58 29 19020102 133.11 106 19020601 243.16 174 19020402 282.10 44 19020504 290.60 130 19020602 462.86 63 19020104 642.95 135 19020401, (timing not yet know), 29
Based on your branch numbers and this graph, I'm pretty sure that we've narrowed it down to the streamline density / number of catchments as the issue that's slowing AK down.
After removing first order streams, I compared the inundation with fim_4_5_2_11 and noticed there is a significant difference between inundated areas. However, the differences did not seem related to the removal of 1st order streams- I reran fim_pipeline for a few HUCs using the current dev version and here are the results. In most areas, the differences make sense, as they occur exactly where the 1st order streams were removed.
Run time:
HUC 19020402: with removed 1st order streams 4 hrs 4 min(s), current dev 6 hrs 5 min(s)
HUC 19020602: with removed 1st order streams 8 hrs 25 min(s), current dev 10 hrs 27 min(s)
HUC 19020402:
HUC 19020602:
I also, compared the new inundation (from current dev) with our previous results where we started the GMS at higher stream orders. Although the differences are smaller than when we removed 1st order streams, the runtime is longer
After rerunning fim_pipeline for all HUCs, it seems that runtime is shorter following the removal of 1st order streams. Here are the results:
The differences in inundated area between the current dev version and after removal of 1st order streams, as well as the comparison of inundated area between the current dev version and higher GMS, are shown below for each HUC.
HUC 19020104:
HUC 19020503:
HUC 19020402:
HUC 19020602:
I modified the src/add_crosswalk.py script, which uses a nested for loop to update the rating curve for small segments. For HUC 19020104, the nested loop took 6.3 hours to run. The new changes significantly cut down the runtime without affecting the inundated areas. So, we don’t need to remove any first-order streams.
Here is an example for HUC 19020402
