patientcounter icon indicating copy to clipboard operation
patientcounter copied to clipboard

Could the intervals be extended to month and/or month-year?

Open Lextuga007 opened this issue 3 years ago • 11 comments

I want to give patientcounter a try with smoking prevalence data by team or ward and I have information over many years so the best way to 'count' the open people in a team or ward are by referrals by month-year. Patientcounter only goes to day - is that right?

Lextuga007 avatar Sep 17 '21 12:09 Lextuga007

Hi @Lextuga007 - I've only just seen this, not sure why I wasn't notified before.

as far as I know, if it works with cut, it should work - this is the guidance for cut.POSIXct:

image

I'd be happy to take a look if you have some trial data you could share (offline)?

johnmackintosh avatar Oct 01 '21 14:10 johnmackintosh

Hey @johnmackintosh did you guys end up finding out if this worked? I'm potentially going to be doing a count of folks added before but not removed from a register on a specific date over multiple years. It appears that specifying "year" would be fine - how would I go about setting the day & month to check at?

will-ball avatar Dec 14 '22 09:12 will-ball

@will-ball I never got round to looking into this in detail. In reference to @Lextuga007's comment, the package doesn't necessarily only go to day level, but it does expect date-time, rather than dates. It was created due to the need for needing hourly or even finer grained counts.

If you use the individual level, the function returns a row per individual per interval, including the original start and end datetimes, plus the interval's base date and hour - which you can use to filter results to a specific date and time.

Alternatively, maybe you could use data.table's rolling joins?

https://www.gormanalysis.com/blog/r-data-table-rolling-joins/

https://r-norberg.blogspot.com/2016/06/understanding-datatable-rolling-joins.html

If you have some fake data to play around with, would be happy to take a look at all the options

johnmackintosh avatar Dec 14 '22 11:12 johnmackintosh

Thanks for getting back to me @johnmackintosh

I've not encountered rolling joins before so will take a look, thanks for flagging. I've got a toy dataset to illustrate:

# Simple Example
library(tidyverse)
library(lubridate)
library(truncnorm)

n_people <- 1000

start_date <- as_date("2012-01-01")
end_date <- as_date("2021-12-31")

set.seed(20221214)

data <- as_tibble(
  list(
    id = sample(1:n_people, replace = TRUE),
    added = start_date + sample.int(end_date - start_date, n_people))) %>% 
  mutate(
    removed = added + rtruncnorm(n_people, mean = 30, sd = 15, a = 1, b = 1000),
    days = added %--% removed %/% days(1))

From data which essentially looks like this, I'd like to count how many people are 'registered' on the 31st July each year. I don't think it should complicate anything but the same person can be added/removed multiple times. I will have a play myself but if you get bored and want to take a look let me know.

will-ball avatar Dec 14 '22 12:12 will-ball

see if this gives you what you need @will-ball ?

library(tidyverse)
library(lubridate)
library(truncnorm)

library(patientcounter)

n_people <- 1000

start_date <- as_date("2012-01-01")
end_date <- as_date("2021-12-31")

set.seed(20221214)

data <- as_tibble(
  list(
    id = sample(1:n_people, replace = TRUE),
    added = start_date + sample.int(end_date - start_date, n_people))) %>% 
  mutate(
    removed = added + rtruncnorm(n_people, mean = 30, sd = 15, a = 1, b = 1000),
    days = added %--% removed %/% days(1))


data2 <- data %>% 
  mutate(added  = as.POSIXct(added), 
         removed = as.POSIXct(removed))

results <- interval_census(data2, 
                           identifier = 'id', 
                           admit = "added", 
                           discharge = "removed", 
                           time_unit = '1 day', 
                           results = 'patient')

results[lubridate::month(base_date)== 7 & lubridate::day(base_date) == 31] %>% 
  arrange(.,id, added)

johnmackintosh avatar Dec 14 '22 12:12 johnmackintosh

results[lubridate::month(base_date)== 7 & lubridate::day(base_date) == 31,.N, .(base_date)]

will give you tallies for each cutoff date

johnmackintosh avatar Dec 14 '22 13:12 johnmackintosh

That works perfectly thanks 😄

will-ball avatar Dec 14 '22 13:12 will-ball

Nice one @will-ball Not sure I've been any use to @Lextuga007 yet so will leave this open for now

johnmackintosh avatar Dec 14 '22 14:12 johnmackintosh

Yes, it does look like "year" is supported as time_unit parameter feeds into {lubridate} functions. However, when I run a smaller example for years there is a strange thing when an end date is already "floored":

library(dplyr)
library(patientcounter)

df <- tibble::tribble(
  ~id,  ~start_date,    ~end_date, ~smoking_status,
   5L, "2024-08-01", NA, "smoker",
   1L, "2019-01-01", "2020-01-01",        "smoker",
   2L, "2019-01-02", "2020-01-02",    "non-smoker",
   3L, "2019-01-03", "2022-01-01",        "smoker",
   4L, "2019-01-04", NA,    "non-smoker"
  ) |> 
  mutate(start_date = as.POSIXct(start_date),
         end_date = as.POSIXct(end_date))
  
results <- interval_census(df, 
                           identifier = 'id', 
                           admit = "start_date", 
                           discharge = "end_date", 
                           time_unit = 'year', 
                           results = 'patient') |> 
  arrange(id)

id 1 should get 2019 and 2020 but because it's end date is on the 1st 2020 doesn't show. I'm guessing but is this something related to the date times and the time is tipping it to 2019-12-31? The same happens with id 3 which should be 2019, 2020, 2021 and 2022 but 2022 is dropped.

Lextuga007 avatar Aug 25 '23 07:08 Lextuga007

Hmm, I wonder if that is timezone related. I haven't tried your code yet, but I've encountered issues with the changeover from BST/ GMT if UTC has not been explicitly declared.

I don't have much bandwidth to look into this at present.

Another possible influencing factor is my use of "within" as the method used with foverlaps. I was thinking about making that a parameter in the main function so that folk can use whatever method suits them best.

Will try and get that sorted soon.

johnmackintosh avatar Aug 26 '23 17:08 johnmackintosh

Tom Jemmett https://github.com/tomjemmett wrote this code which I've adapted for the data I used and it's made me realise that what I need to count is not really a census as I don't want to subtract people who leave for something like prevalence.

df |> 
  tidyr::pivot_longer(-c(id, smoking_status), 
                      values_to = "date") |>
  dplyr::mutate(n = ifelse(name == "start_date", 1, -1)) |>
  tidyr::replace_na(list(date = lubridate::today())) |> 
  dplyr::mutate(date = lubridate::floor_date(date, "year")) |> 
  dplyr::arrange(date, smoking_status) |>
  dplyr::mutate(c = cumsum(n),
                .by = smoking_status) |> 
  dplyr::select(-name, -id, -n) |>  
  dplyr::slice_tail(n = 1, by = c(date, smoking_status)) |> 
  tidyr::complete(date = seq(min(date), max(date), by = "year")) |> 
  tidyr::fill(c(c, smoking_status)) |>
  tidyr::replace_na(list(c = 0))

I think for prevalence I'd need to drop the generating of -1 for an exit.

Lextuga007 avatar Sep 16 '23 09:09 Lextuga007