CohortDiagnostics
CohortDiagnostics copied to clipboard
Inclusion rules - how to interpret and use them
From @chrisknoll
There are three inclusion rules tables
cohort_inclusion = "#cohort_inclusion"
Has rule names and description by cohort. Useful to know what rule are we looking at
cohort_inclusion_stats = "#cohort_inc_stats"
Has person_count, gain_count, person_total
select *
from cohort_inclusion_stats
where cohort_definition_id = 2511
would give something like this:
Mode 0 = all events, mode 1 = best event. The beset event is the single event per person that matched the most inclusion criteria. Because the person total for mode_id =0 is 37.6k, but mode 1 is 12.8k, we can tell that this cohort has multiple events per person.
cohort_inclusion_result = "#cohort_inc_result"
select *
from results_optum_extended_dod_v1707.cohort_inclusion_result
where cohort_definition_id = 2511
In this table, inclusion_rule_mask is a bitstring of inclusion rules (0 based index) that matched that combination, and the count. So the first row says 26 entry events met mask = 7, which is 111 in binary, which is inclusion rule 1,2 and 3. 29072 people met no criteria (mask = 0), and 57 people had mask = 5, which is 101 which is inclusion rule 1 and 3 (index 0 and 2 of the bits are set)
bit operators to find people with inclusion rule 3 for example (which would be 2^2 = 4), so it’s something like:
WHERE inclusion_rule_mask & 4 = 4
This returns all the rows where that bit is set, and then you can GROUP BY SUM(person_count) on those to tell you number of people who had that inclusion rule.
If you wanted rule 3 and rule 1, that would be 4+1 = 5 so where inclusion_rule_mask & 5 = 5.
If you wanted to check for people that had any of those, then it would just be maxk & 5 > 0 because if any of the bits in ‘5’ are set, you get a > 0 result.
cohort_summary_stats = "#cohort_summary_stats"
This is fourth table, that is a derived table that is only present in Cohort Diagnostics
All four are in Cohort Diagnostics are in version 3 results data model here
as.integer(intToBits(5))
[1] 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
as.integer(intToBits(7))
[1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
This function would allow you to get cohort attrition view
getCohortAttritionViewResults <- function(inclusionResultTable,
maxRuleId) {
numberToBitString <- function(numbers) {
vapply(numbers, function(number) {
if (number == 0) {
return("0")
}
bitString <- character()
while (number > 0) {
bitString <- c(as.character(number %% 2), bitString)
number <- number %/% 2
}
paste(bitString, collapse = "")
}, character(1))
}
# problem - how to create attrition view
bitsToMask <- function(bits) {
positions <- seq_along(bits) - 1
number <- sum(bits * 2 ^ positions)
return(number)
}
ruleToMask <- function(ruleId) {
bits <- rep(1, ruleId)
mask <- bitsToMask(bits)
return(mask)
}
inclusionResultTable <- inclusionResultTable |>
dplyr::mutate(inclusionRuleMaskBitString = numberToBitString(inclusionRuleMask))
output <- c()
for (i in (1:maxRuleId)) {
suffixString <- numberToBitString(ruleToMask(i))
output[[i]] <- inclusionResultTable |>
dplyr::filter(endsWith(x = inclusionRuleMaskBitString,
suffix = suffixString)) |>
dplyr::group_by(cohortDefinitionId,
modeId) |>
dplyr::summarise(personCount = sum(personCount), .groups = "drop") |>
dplyr::ungroup() |>
dplyr::mutate(id = i)
}
output <- dplyr::bind_rows(output)
return(output)
}
@chrisknoll and I worked on this problem for many hours today. Key learning is how to handle large numbers. We used the same strategy that is currently used in webapi to process the inclusionResultTable for processing inclusionRuleMask, i.e. use string and string match, instead of bit match.
A simple way to solve it would be the code below, but it fails in base R when the value goes beyond integer range because the used functions only support integer range. This is relevant when we have a lot of inclusion rules e.g. more than 32
ruleToMask <- function(ruleId) {
bits <- rep(1, ruleId)
bitsToMask <- function(bits) {
positions <- seq_along(bits) - 1
number <- sum(bits * 2 ^ positions)
return(number)
}
mask <- bitsToMask(bits)
return(mask)
}
a <- dplyr::tibble(inclusionRuleMask = c(15, 11, 7, 1),
personCount = c(20, 20, 20, 20))
ruleId <- 3
maskId <- ruleToMask(ruleId = 3)
a |>
dplyr::filter(bitwAnd(inclusionRuleMask, maskId) == maskId) |>
dplyr::summarise(personCount = sum(personCount))