Populate wind and temp from weather
A lot of data in the play_by_play wind and temp data is missing, including all 2021 games. As @guga31bb mentioned in https://github.com/nflverse/nflfastR-data/issues/32, it's probably just due to what the NFL provides.
However, for most of these cases, the data to populate the columns are readily available in weather. Just wanted to put it out there that these could be backfilled.

I don't use R or I'd open a PR, but FWIW here's the algorithm/regex I'm using locally with good results.
def parse_precipitation(play):
return (
'Rain' in play['weather'] or
'rain' in play['weather'] or
'Snow' in play['weather'] or
'snow' in play['weather']
)
def parse_temperature(play):
if not np.isnan(play['temp']):
return play['temp']
if play['weather']:
match = re.search('Temp: (\d+)°', play['weather'])
if match:
return int(match.group(1))
def parse_wind(play):
if not np.isnan(play['wind']):
return play['wind']
if play['weather']:
match = re.search('Wind:.* (\d+) ', play['weather'])
if match:
return int(match.group(1))
This is example code to extract the parts of the weather string
df <- pbp |>
dplyr::filter(!is.na(weather)) |>
dplyr::distinct(season, game_id, weather) |>
dplyr::mutate(
temp_f = dplyr::case_when(
str_detect(weather, "Indoors") ~ NA_character_,
TRUE ~ str_extract(weather, "(?<=Temp: )-?[:digit:]{1,3}")
),
temp_f = as.numeric(temp_f),
temp_c = (temp_f - 32) * 5 / 9,
hum = str_extract(weather, "(?<=Humidity: )[:digit:]{1,3}"),
hum = as.numeric(hum) / 100,
wind = str_extract(weather, "(?<=Wind: ).+(?= mph)") |> str_trim()
) |>
dplyr::na_if("") |>
dplyr::filter(!(is.na(temp_f) & is.na(hum) & is.na(wind)))
df
# A tibble: 2,723 x 7
game_id season weather temp_f temp_c hum wind
<chr> <int> <chr> <dbl> <dbl> <dbl> <chr>
1 2001_01_ATL_SF 2001 partly cloudy Temp: 68° F, Humidity: 63%, Wind: Southwest 12 MPH mph 68 20 0.63 Southwest 12~
2 2001_01_CAR_MIN 2001 Temp: 65° F, Wind: mph 65 18.3 NA NA
3 2001_01_CHI_BAL 2001 Mostly cloudy, highs in mid 80's Temp: 83° F, Humidity: 66%, Wind: Sou~ 83 28.3 0.66 South 10
4 2001_01_DET_GB 2001 Rain throughout game, heavy showers possible. Temp: 60° F, Humidity: 9~ 60 15.6 0.93 NW 5
5 2001_01_IND_NYJ 2001 Partly Sunny Temp: 81° F, Humidity: 81%, Wind: SW 6 mph mph 81 27.2 0.81 SW 6 mph
6 2001_01_MIA_TEN 2001 Partly Cloudy & Windy Temp: 81° F, Humidity: 69%, Wind: From the South~ 81 27.2 0.69 From the Sou~
7 2001_01_NE_CIN 2001 Partly clooudy, poossible showers/thunderstorms Temp: 79° F, Humidity:~ 79 26.1 0.87 S 8
8 2001_01_NO_BUF 2001 Sunny Temp: 87° F, Humidity: 52%, Wind: SW 10 mph 87 30.6 0.52 SW 10
9 2001_01_NYG_DEN 2001 Clear Temp: 75° F, Humidity: 18%, Wind: SE 9 mph 75 23.9 0.18 SE 9
10 2001_01_OAK_KC 2001 Mostly Sunny Temp: 64° F, Humidity: 78%, Wind: Northwest 12 mph 64 17.8 0.78 Northwest 12
# ... with 2,713 more rows
The pbp data set already is huge and these additional variables inflate it unnecessarily. I have provided code to extract information from the weather string. So I am going to close this as not planned.