demos-and-how-tos icon indicating copy to clipboard operation
demos-and-how-tos copied to clipboard

Add caseload size calculation

Open ChrisBeeley opened this issue 2 years ago • 7 comments

Using this script to produce example data (or submitting your own example data) please add a function that calculates the caseload size per team (or teams) by day (or range of days).

See the README for an explanation of the data.

We would love multiple solutions to this please- tidyverse, base, data.table, SQL, Python, you name it :slightly_smiling_face:

ChrisBeeley avatar Sep 25 '22 20:09 ChrisBeeley

PS I banged out the synthetic data, if anybody has improvements or comments I'd be glad to hear them

ChrisBeeley avatar Sep 25 '22 21:09 ChrisBeeley

This is my attempt using SQL:

caseload_size_MC.txt

NHSmatthewcallus avatar Sep 26 '22 07:09 NHSmatthewcallus

This works for me:

`-- Caseload

declare @StartDate date = '2022-08-31' declare @EndDate date = getdate()

;with cteCaseload as ( select IDPatient ,CONVERT(date,ref.[DateReferral]) as ReferralDate ,ref.[Discharge_Date] ,TeamName ,1 as Caseload FROM MyReferralsTable ref where ref.[DateReferral]<@EndDate and (ref.Discharge_Date>=@StartDate OR ref.Discharge_Date is null) )

select c.IDPatient ,d.Date as EventDate ,c.TeamName ,1 as Caseload from MyDateTable d
left join cteCaseload c on c.ReferralDate <= d.Date
and (c.[Discharge_Date] >= d.Date or c.[Discharge_Date] is null) where d.Date >= @StartDate and d.Date <= @EndDate

`

LeslieHewitt avatar Sep 27 '22 13:09 LeslieHewitt

This one for a dplyr version...

It uses dates from the synthetic data generation script, but that can be changed to whatever dates are required.

team_caseload <- test_frame %>%
  full_join(data.frame(dates), by = character()) %>%
  group_by(client_id, team_desc, dates) %>%
  summarise(is_in = max(referral_date <= dates 
                        & (discharge_date > dates | is.na(discharge_date)))) %>%
  group_by(team_desc, dates) %>%
  summarise(total_caseload_on_date = sum(is_in))

jontix avatar Sep 28 '22 09:09 jontix

From @johnmackintosh https://gist.github.com/johnmackintosh/1d7b1763ef3cf77d5a2f6357638606b7

Lextuga007 avatar Sep 28 '22 09:09 Lextuga007

you could treat this as a continous function by pivotting the dates into a single column, then using referral_date as +1, discharge_date as -1. If you then arrange on the pivotted date column, sort by that date, we can then cumulative sum the count column to give us our caseload at any point in time. This would probably be a more interesting approach with a date time column.

There is one big caveat if you were using this, you would need to write a query that extractted date before your period of interest, e.g. .data$discharge_date > start_date | .data$referral_date < end_date | is.na(.data$discharge_date)

library(tidyverse)

add_end_row <- function(.data, end_date = NULL) {
  if (is.null(end_date)) {
    end_date <- max(.data$date)
  }
  bind_rows(.data, summarise(.data, date = end_date, across(count, last)))
}

test_frame |>
  pivot_longer(ends_with("date"), names_to = "date_type", values_to = "date") |>
  drop_na(date) |>
  mutate(count = ifelse(date_type == "referral_date", 1, -1)) |>
  arrange(date) |>
  select(team_desc, date, count) |>
  group_by(team_desc) |>
  mutate(across(count, cumsum)) |>
  add_end_row(Sys.Date()) |>
  ggplot(aes(date, count, colour = team_desc)) +
  geom_step()

tomjemmett avatar Oct 04 '22 08:10 tomjemmett

The new package {ivs} https://github.com/DavisVaughan/ivs might also be useful for this YouTube 50:52

Lextuga007 avatar Feb 22 '23 17:02 Lextuga007