demos-and-how-tos
demos-and-how-tos copied to clipboard
Add caseload size calculation
Using this script to produce example data (or submitting your own example data) please add a function that calculates the caseload size per team (or teams) by day (or range of days).
See the README for an explanation of the data.
We would love multiple solutions to this please- tidyverse, base, data.table, SQL, Python, you name it :slightly_smiling_face:
PS I banged out the synthetic data, if anybody has improvements or comments I'd be glad to hear them
This works for me:
`-- Caseload
declare @StartDate date = '2022-08-31' declare @EndDate date = getdate()
;with cteCaseload as ( select IDPatient ,CONVERT(date,ref.[DateReferral]) as ReferralDate ,ref.[Discharge_Date] ,TeamName ,1 as Caseload FROM MyReferralsTable ref where ref.[DateReferral]<@EndDate and (ref.Discharge_Date>=@StartDate OR ref.Discharge_Date is null) )
select c.IDPatient
,d.Date as EventDate
,c.TeamName
,1 as Caseload
from MyDateTable d
left join cteCaseload c on c.ReferralDate <= d.Date
and (c.[Discharge_Date] >= d.Date or c.[Discharge_Date] is null)
where d.Date >= @StartDate
and d.Date <= @EndDate
`
This one for a dplyr version...
It uses dates
from the synthetic data generation script, but that can be changed to whatever dates are required.
team_caseload <- test_frame %>%
full_join(data.frame(dates), by = character()) %>%
group_by(client_id, team_desc, dates) %>%
summarise(is_in = max(referral_date <= dates
& (discharge_date > dates | is.na(discharge_date)))) %>%
group_by(team_desc, dates) %>%
summarise(total_caseload_on_date = sum(is_in))
From @johnmackintosh https://gist.github.com/johnmackintosh/1d7b1763ef3cf77d5a2f6357638606b7
you could treat this as a continous function by pivotting the dates into a single column, then using referral_date
as +1, discharge_date
as -1. If you then arrange on the pivotted date column, sort by that date, we can then cumulative sum the count column to give us our caseload at any point in time. This would probably be a more interesting approach with a date time column.
There is one big caveat if you were using this, you would need to write a query that extractted date before your period of interest, e.g. .data$discharge_date > start_date | .data$referral_date < end_date | is.na(.data$discharge_date)
library(tidyverse)
add_end_row <- function(.data, end_date = NULL) {
if (is.null(end_date)) {
end_date <- max(.data$date)
}
bind_rows(.data, summarise(.data, date = end_date, across(count, last)))
}
test_frame |>
pivot_longer(ends_with("date"), names_to = "date_type", values_to = "date") |>
drop_na(date) |>
mutate(count = ifelse(date_type == "referral_date", 1, -1)) |>
arrange(date) |>
select(team_desc, date, count) |>
group_by(team_desc) |>
mutate(across(count, cumsum)) |>
add_end_row(Sys.Date()) |>
ggplot(aes(date, count, colour = team_desc)) +
geom_step()
The new package {ivs} https://github.com/DavisVaughan/ivs might also be useful for this YouTube 50:52