dplyr icon indicating copy to clipboard operation
dplyr copied to clipboard

Update the storms dataset

Open steveharoz opened this issue 3 years ago • 7 comments

(closes #6319)

  1. A bug in the code to reformat the data was causing some storms to be dropped. That's been fixed.
  2. Data for 2021 storms has been added.
  3. I've added data for earlier storms (1852-1974).

Point 3 might be worth discussing. Whoever originally added the dataset to dplyr dropped storms before 1975. I've been doing the same since I've been updating it, but I haven't seen a clear rationale. Considerations for adding the early data:

  • PRO: More data
  • CON: Bigger data file 42k -> 88k
  • CON: The early data may be less useful. E.g., many storms from the 1800s have only a single data point.
  • PRO: This is supposed to be an educational dataset. Filtering out the less useful data is a simple realistic exercise for learning dplyr.

steveharoz avatar Jul 05 '22 21:07 steveharoz

Hi Steve!

Quick suggestion: to add the other status codes to the line below - if someone does want to keep the filter commented out, they won’t have to go back and clean the status for those.

Thanks again!

status = factor(recode(status, "HU" = "hurricane", "TS" = "tropical storm", "TD" = "tropical depression")) EX – extratropical SD – subtropical depression SS – subtropical storm LO – low WV – tropical wave DB – disturbance

mydatacz avatar Jul 10 '22 22:07 mydatacz

Trying to figure out what's going on with storm categorization. Bug in my parser? Or inconsistency in NOAA's categorization? These records should be classified as hurricanes (winds > 64 knots) but are subtropical storms, tropical storms, or other lows:

> storms %>% 
+     filter(category > 0, !(status %in% c("hurricane", "EX"))) %>% 
+     select(name, year, month, day, hour, lat, long, status, wind, pressure)
# A tibble: 6 × 10
  name      year month   day  hour   lat  long status          wind pressure
  <chr>    <dbl> <dbl> <int> <dbl> <dbl> <dbl> <fct>          <int>    <int>
1 AL091968  1968     9    20    12  35.5 -49.5 SS                75      976
2 AL091968  1968     9    21    12  39.6 -44.7 SS                65      982
3 AL181979  1979    10    24    18  40.5 -62   SS                65      985
4 EMILY     2005     7    20    18  25   -98.7 tropical storm    70      975
5 DORIAN    2019     9     7    18  42.8 -64.6 LO                80      954
6 DORIAN    2019     9     8     0  45.2 -62.9 LO                80      956

steveharoz avatar Jul 11 '22 11:07 steveharoz

I think it may not be either issues. I think the data is likely correct.

Subtropical, Extratropical, Lows and Disturbances can all have high wind intensity, but that doesn't mean they are hurricanes. A storm needs to be determined to be a tropical cyclone before it can rise to the level of a hurricane (based on wind speed/intensity). Definitions of types of storms here: https://www.nhc.noaa.gov/aboutgloss.shtml

https://www.nhc.noaa.gov/data/hurdat/hurdat2-format-atl-1851-2021.pdf

HU (Spaces 20-21, before 4th comma) – Status of system. Options are: TD – Tropical cyclone of tropical depression intensity (< 34 knots) TS – Tropical cyclone of tropical storm intensity (34-63 knots) HU – Tropical cyclone of hurricane intensity (> 64 knots) EX – Extratropical cyclone (of any intensity) SD – Subtropical cyclone of subtropical depression intensity (< 34 knots) SS – Subtropical cyclone of subtropical storm intensity (> 34 knots) LO – A low that is neither a tropical cyclone, a subtropical cyclone, nor an extratropical cyclone (of any intensity) WV – Tropical Wave (of any intensity) DB – Disturbance (of any intensity)

So if a storm does not meet the requirements to be classified as a tropical cyclone, regardless of wind speed, it will never have a status of hurricane.

For example, you've probably experienced wind conditions 34 - 47 knots, which is a gale, unless the wind was associated with a storm that was already determined to be a tropical cyclone (based on additional criteria aside from wind).

Or you may have been in a winter Nor'easter (extratropical storm that can have winds over 65 knots but isn't a hurricane). I don't know the measurements behind it, but reading the definition and the link below it appears there are multiple characteristics/metrics used to determine if a storm is a tropical cyclone.

https://www.weather.gov/source/zhu/ZHU_Training_Page/tropical_stuff/sub_extra_tropical/subtropical.htm

Category > 0 had me stumped also at first, but then I realized the categories are based on a wind scale. Non-tropical cyclone storms also get assigned categories based on wind in the data. At first glance they appear to coincide with the Saffir-Simpson Wind scale, but I don't know that for sure. Having followed weather reports closely as a sailor, though, I've never heard NOAA refer to a category 1 gale or category 1 nor'easter, so I think NOAA only uses categories to describe hurricanes. https://www.nhc.noaa.gov/aboutsshws.php (see my second comment below - just realized the category data did not come from the original file).

luis.df <- storms %>% filter(name == "Luis") %>% select(year, name, category, status, wind)

47 | 1995 | Luis | 2 | hurricane | 95 48 | 1995 | Luis | 2 | hurricane | 90

53 | 1995 | Luis | 2 | EX | 85 54 | 1995 | Luis | 2 | EX | 95 55 | 1995 | Luis | 3 | EX | 105 56 | 1995 | Luis | 2 | EX | 90 57 | 1995 | Luis | 1 | EX | 75 58 | 1995 | Luis | 0 | EX | 60

Hurricane Luis, for example, was an extratropical storm at some point, with wind categories 1, 2, and 3, but the other characteristic of the storm at that time did not meet the criteria for a tropical cyclone anymore, even though the winds were often higher than when it was a tropical cyclone.

So I guess the take-away would be not to filter the data based on category thinking you are only going to get hurricanes.

mydatacz avatar Jul 11 '22 19:07 mydatacz

Actually, I just realized when looking at the original hurdat2 format file linked above, it seems category is not a column coming from NOAA. Since it is a column added in/calculated as part of the dyplr file, maybe only add the category for the tropical depressions, tropical storms and hurricanes, and leave the other storm types with an NA? I'm not sure of what the purpose of the -1 and 0 category are for tropical depressions and tropical storms, since the Saffir-Simpson Wind scale doesn't start until 1 for hurricanes. If you decide to use category for only hurricanes, all the other storm status categories could be 0.

mydatacz avatar Jul 11 '22 23:07 mydatacz

Yes, category is calculated from windspeed. I've made that a bit more clear in docs and set it to NA for everything that's not a hurricane.

steveharoz avatar Jul 12 '22 19:07 steveharoz

@steveharoz do you want to finish this off?

hadley avatar Aug 09 '22 19:08 hadley

@hadley Yeah. I'll finish it later this week.

steveharoz avatar Aug 09 '22 21:08 steveharoz

Thanks for the update! I think the last question to resolve is whether it's worth while to include the rows prior to 1975 — I'm worried that this has a high likelihood of breaking existing graphics for little additional gain. I think it's probably safer to not include the historical data here.

hadley avatar Aug 17 '22 22:08 hadley

@hadley Yeah, I see the benefit of only having the clean and more complete data.

steveharoz avatar Aug 17 '22 23:08 steveharoz

Thanks! I did a couple more docs tweaks because I realised that this is the perfect place to use inline R code.

hadley avatar Aug 18 '22 12:08 hadley

Good call on the inline R!

steveharoz avatar Aug 18 '22 12:08 steveharoz