grf icon indicating copy to clipboard operation
grf copied to clipboard

Question on whether it is possible/valid to construct a TOC plot using RATEs estimated from cross-fitting/sequential-fitting

Open Lee-xli opened this issue 10 months ago • 2 comments
trafficstars

Hi grf,

A desire stemmed from working with small-to-medium-sized datasets (400-1500 subjects with 50-70% event rates) is that one should utilise as much data as possible for estimation. This is often the case with many RCTs in the medical literature, where large RCTs, such as the ACCORD, SPRINT, and IST (referenced in gfr resource materials), are rare.

As such, I am wondering whether it is possible to construct TOC plots using estimated RATE in a cross-/sequential-fitting framework. If confidence intervals of RATE are not too critical/required (given that a formal t-test can be done separately), is it valid to simply pool (average or weighted average of) the RATE estimates from all test folds, assuming the distribution of the policy/criteria is not too different across folds. In case of non-comparable distributions, I guess one may be able to set q in each fold so that it reflects the trial-wise distribution.

Ideally, cross-fitting would be preferred, allowing all data to be used. However, would sequential fitting be preferred given the potential correlation of RATE estimates across folds?

Many thanks in advance,

Lee

Lee-xli avatar Jan 04 '25 09:01 Lee-xli

Hi @Lee-xli, the following vignette shows how to aggregate RATE metrics. The RATE is the area under the TOC curve, so if you just want a t-value for a point on the TOC, you could aggregate \hat TOC(q) / se(\hat TOC(q)) for a given q in the same manner. Alternatively, if the aggregated RATE metric indicates presence of HTE, then you've already established that the prioritization rule is picking up HTE, and so it makes sense using that formulation to estimate other policy values of interest.

erikcs avatar Jan 13 '25 09:01 erikcs

Thank you very much @erikcs and my apologies for the late reply. Please excuse my ignorance - is it possible to average the estimated area (hat_TOC(q)) from each fold, provided that the q covers the same underlying prioritisation scores e.g. using ranking distribution from the entire population to define q in each fold to ensure meaningful pooling, and then plot the averaged hatTOC(q)?

Lee-xli avatar Feb 06 '25 15:02 Lee-xli