CohortDiagnostics icon indicating copy to clipboard operation
CohortDiagnostics copied to clipboard

Inclusion rules - how to interpret and use them

Open gowthamrao opened this issue 3 years ago • 3 comments

From @chrisknoll

There are three inclusion rules tables

cohort_inclusion = "#cohort_inclusion"

Has rule names and description by cohort. Useful to know what rule are we looking at

cohort_inclusion_stats = "#cohort_inc_stats"

Has person_count, gain_count, person_total

select *
from cohort_inclusion_stats
where cohort_definition_id = 2511

would give something like this:

image Mode 0 = all events, mode 1 = best event. The beset event is the single event per person that matched the most inclusion criteria. Because the person total for mode_id =0 is 37.6k, but mode 1 is 12.8k, we can tell that this cohort has multiple events per person.

cohort_inclusion_result = "#cohort_inc_result"

select *
from results_optum_extended_dod_v1707.cohort_inclusion_result
where cohort_definition_id = 2511

image

In this table, inclusion_rule_mask is a bitstring of inclusion rules (0 based index) that matched that combination, and the count. So the first row says 26 entry events met mask = 7, which is 111 in binary, which is inclusion rule 1,2 and 3. 29072 people met no criteria (mask = 0), and 57 people had mask = 5, which is 101 which is inclusion rule 1 and 3 (index 0 and 2 of the bits are set)

bit operators to find people with inclusion rule 3 for example (which would be 2^2 = 4), so it’s something like:

WHERE inclusion_rule_mask & 4 = 4

This returns all the rows where that bit is set, and then you can GROUP BY SUM(person_count) on those to tell you number of people who had that inclusion rule.

If you wanted rule 3 and rule 1, that would be 4+1 = 5 so where inclusion_rule_mask & 5 = 5.

If you wanted to check for people that had any of those, then it would just be maxk & 5 > 0 because if any of the bits in ‘5’ are set, you get a > 0 result.

cohort_summary_stats = "#cohort_summary_stats"

This is fourth table, that is a derived table that is only present in Cohort Diagnostics

All four are in Cohort Diagnostics are in version 3 results data model here

gowthamrao avatar Oct 12 '21 23:10 gowthamrao

as.integer(intToBits(5))

[1] 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

as.integer(intToBits(7))

[1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

gowthamrao avatar Oct 13 '21 00:10 gowthamrao

This function would allow you to get cohort attrition view

getCohortAttritionViewResults <- function(inclusionResultTable,
                                          maxRuleId) {
  numberToBitString <- function(numbers) {
    vapply(numbers, function(number) {
      if (number == 0) {
        return("0")
      }
      
      bitString <- character()
      while (number > 0) {
        bitString <- c(as.character(number %% 2), bitString)
        number <- number %/% 2
      }
      
      paste(bitString, collapse = "")
    }, character(1))
  }
  
  # problem - how to create attrition view
  bitsToMask <- function(bits) {
    positions <- seq_along(bits) - 1
    number <- sum(bits * 2 ^ positions)
    return(number)
  }
  
  ruleToMask <- function(ruleId) {
    bits <- rep(1, ruleId)
    mask <- bitsToMask(bits)
    return(mask)
  }
  
  inclusionResultTable <- inclusionResultTable |>
    dplyr::mutate(inclusionRuleMaskBitString = numberToBitString(inclusionRuleMask))
  
  output <- c()
  
  for (i in (1:maxRuleId)) {
    suffixString <- numberToBitString(ruleToMask(i))
    output[[i]] <- inclusionResultTable |>
      dplyr::filter(endsWith(x = inclusionRuleMaskBitString,
                             suffix = suffixString)) |>
      dplyr::group_by(cohortDefinitionId,
                      modeId) |>
      dplyr::summarise(personCount = sum(personCount), .groups = "drop") |>
      dplyr::ungroup() |>
      dplyr::mutate(id = i)
  }
  
  output <- dplyr::bind_rows(output)
  
  return(output)
}

gowthamrao avatar Apr 04 '24 17:04 gowthamrao

@chrisknoll and I worked on this problem for many hours today. Key learning is how to handle large numbers. We used the same strategy that is currently used in webapi to process the inclusionResultTable for processing inclusionRuleMask, i.e. use string and string match, instead of bit match.

A simple way to solve it would be the code below, but it fails in base R when the value goes beyond integer range because the used functions only support integer range. This is relevant when we have a lot of inclusion rules e.g. more than 32

ruleToMask <- function(ruleId) {
  bits <- rep(1, ruleId)
  
  bitsToMask <- function(bits) {
    positions <- seq_along(bits) - 1
    number <- sum(bits * 2 ^ positions)
    return(number)
  }
  
  mask <- bitsToMask(bits)
  return(mask)
}

a <- dplyr::tibble(inclusionRuleMask = c(15, 11, 7, 1),
                   personCount = c(20, 20, 20, 20))

ruleId <- 3
maskId <- ruleToMask(ruleId = 3)
a |>
  dplyr::filter(bitwAnd(inclusionRuleMask, maskId) == maskId) |>
  dplyr::summarise(personCount = sum(personCount))

gowthamrao avatar Apr 04 '24 17:04 gowthamrao