r4ds-exercise-solutions
r4ds-exercise-solutions copied to clipboard
Calculating ground time in Question 3, section 16.3.4
The question text:
Compare air_time with the duration between the departure and arrival. Explain your findings. (Hint: consider the location of the airport.)
I spent 10 days on this to figure out the following:
-
Departure times of 2400 are actually midnight the next day.
-
There's no date given for arrival times. We have to assume that when the arrival time is prior to departure it means the flight arrives the next day.
-
Differences in timezone between arrival and departure need to be taken into account. There is no data for some destinations in the airports table. As it happens these destinations are all in the same timezone and we set them to that.
-
6 airports (HNL, PHX, BQN, PSE, SJU and STT) are in locations that do not observe DST. However, all destinations have been adjusted for DST. In my analysis I added the hour to the taxi_time variable, though perhaps adding it to the arrival time would be better.
-
There are three data points left, just on the cusp of the end of daylight savings. I just bumped them up by 1hr.
The end result is this mutate script:
# Daylight savings.
dst_start <- ymd_hm("2013-03-10 03:00")
dst_end <- ymd_hm("2013-11-03 02:00")
# None of these airports observe daylight savings.
no_dst_airports <- c("HNL", "PHX", "BQN", "PSE", "SJU", "STT")
# These are not in the airports table.
na_airports <- c("BQN", "PSE", "SJU", "STT")
arr_dep <- flights %>%
filter(!is.na(dep_time), !is.na(arr_time), !is.na(air_time)) %>%
left_join(airports, c("dest" = "faa")) %>%
mutate(
# Get the departure datetime.
# A dep_time of 2400 indicates thatit leaves at midnight the following day.
dep_time_zero = if_else(dep_time == 2400, as.integer(0), dep_time),
dep_day = if_else(dep_time == 2400, as.integer(day + 1), day),
dep_hour = dep_time_zero %/% 100,
dep_min = dep_time_zero %% 100,
dep_dt = make_datetime(year, month, dep_day, dep_hour, dep_min),
# Get the arrival datetime.
arr_hour = arr_time %/% 100,
arr_min = arr_time %% 100,
# In flight during midnight. Assumes no 24hr flights.
arr_day = if_else(arr_hour < dep_hour, as.integer(dep_day + 1), dep_day),
arr_dt = make_datetime(year, month, arr_day, arr_hour, arr_min),
# Find the flight duration in minutes.
duration = as.numeric(difftime(arr_dt, dep_dt, units = "mins")),
# Add missing tz.
tz = if_else(dest %in% na_airports, -4, tz),
# Adjust for timezone relative to New York.
duration_tz = duration - ((tz + 5) * 60),
# Destination does not observe daylight savings.
no_dst = dest %in% no_dst_airports,
# The flight arrives during daylight savings.
arr_during_dst = arr_dt > dst_start & arr_dt < dst_end,
# Adjustment where arrival time is wrongly recorded as daylight savings
dst_adjustment = if_else(no_dst & arr_during_dst, 60, 0),
# "Duration" seems to be the time travelling when not in the air.
taxi_time_unadjusted = duration_tz - air_time,
taxi_time = duration_tz - air_time + dst_adjustment,
# There are three remaining flights where taxi_time is negative.
# They're within a couple of hours of the DST so bump them up an hour.
taxi_time = if_else(taxi_time < 0, taxi_time + 60, taxi_time)
) %>%
select(origin, dest,
dep_dt, arr_dt,
air_time, duration, duration_tz, taxi_time, taxi_time_unadjusted,
no_dst, arr_during_dst,
distance, lat, lon, tz)
Once all this is done, the difference between dep_time - arr_time air_time is always positive and on average about half an hour. A reasonable guess is that dep_time - arr_time is equal to the time on the ground which I've referred to as taxi_time.