Black Box Predictive Control Model Causal Inference Library

Open mikepsinn opened this issue 3 months ago • 0 comments

Data Collection and Analysis

We collect data on food and drug intake in addition to symptom severity ratings.

Adaptive Intervention and Predictive Control Models

This data is fed into a predictive control model system, a concept borrowed from behavioral medicine and control systems engineering. This system uses the data to continually refine its suggestions, helping you optimize your health and well-being.

Adaptive intervention is a strategy used in behavioral medicine to create individually tailored strategies for the prevention and treatment of chronic disorders. It involves intensive measurement and frequent decision-making over time, allowing the intervention to adapt to the individual's needs.

Predictive control models are a control system that uses data to predict future outcomes and adjust actions accordingly. In the context of Longevitron, this means using the data it collects to predict your future health outcomes and adjust its suggestions to optimize your health.

A control systems engineering approach for adaptive behavioral interventions: illustration with a fibromyalgia intervention - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4167895/

Real-Life Application and Benefits

Consider a hypothetical scenario where you're dealing with a chronic condition like fibromyalgia. We collect data on your symptoms, medication intake, stress levels, sleep quality, and other relevant factors. It would then feed this data into its predictive control model system, which would use it to predict your future symptoms and adjust your treatment plan accordingly.

This could involve suggesting changes to your medication dosage, recommending lifestyle changes, or even alerting your healthcare provider if it detects a potential issue. The goal is to optimize your health and well-being based on your needs and circumstances.

decision-support-notifications

☝️The image above is what we're trying to achieve here.

To determine the effects of various factors on health outcomes, we currently apply pharmacokinetic modeling over various onset delay and duration of action hyper-parameters and combine that with some other parameters for each of Hill's criteria for causality.

The distributions in this type of data aren't super normal, and you've got the onset delays and durations of action so regular Pearson correlations don't work so well. So we mainly focus on change from baseline. There's a ton of room for improvement by controlling using instrumental variables or convolutional recursive neural networks.

Hybrid Predictive Control Black Box Models seem most appropriate.

Test and Training Data

It's a matrix of years of self-reported Arthritis Severity Rating measurements and hundreds of potential factors over time.

https://github.com/curedao/curedao-black-box-optimization-engine/raw/main/data/arthritis-factor-measurements-matrix-zeros-unixtime.csv

Format

The first row is the variable names. The first column is Unix timestamp (seconds since 1970-01-01 00:00:00 UTC).

Pre-Processing

To make it easier to analyze some preprocessing has been done. This includes zero-filling where appropriate. Also, the factor measurement values are aggregated values preceding the Arthritis measurements based on the onset delay and duration of action.

Hyper-Parameters

The aggregation method and other hyper-parameters can be found by putting the Variable Name in either

the API Explorer or
in the URL https://studies.fdai.earth/VARIABLE_NAME_HERE.

Determining Treatment Effects from Sparse and Irregular Time Series Data

¶ Introduction

Analyzing the effects of a treatment based on observational time series data is a common need in many domains like medicine, psychology, and economics. However, this analysis often faces several key challenges:

The data is sparse - there are limited number of observations.
The data is irregular - observations are not at regular time intervals.
There is missing data - many timepoints have no observation.
The onset delay of the treatment effect is unknown. It may take time to appear.
The duration of the treatment effect is unknown. It may persist after cessation.
Both acute (short-term) and cumulative (long-term) effects need to be analyzed.
Causality and statistical significance need to be established rigorously.
The optimal dosage needs to be determined to maximize benefits.

This article provides a comprehensive methodology to overcome these challenges and determine whether a treatment makes an outcome metric better, worse, or has no effect based on sparse, irregular time series data with missingness.

¶ Data Preprocessing

Before statistical analysis can begin, the data must be preprocessed:

Resample the time series to a regular interval if needed while preserving original timestamps. This allows handling missing data. For example, resample to 1 measurement per day.
Do not do interpolation or forward fill to estimate missing values. This introduces incorrect data. Simply exclude those time periods from analysis.
Filter out any irrelevant variances like daily/weekly cycles. For example, detrend the data.

Proper preprocessing sets up the data for robust analysis.

¶ Statistical Analysis Methodology

With cleaned data, a rigorous methodology can determine treatment effects:

¶ Segment Data

First, split the data into three segments:

Pre-treatment - Period before treatment began
During treatment - Period during which treatment was actively administered
Post-treatment - Period after treatment ended

This enables separate analysis of the acute and cumulative effects.

¶ Acute Effects Analysis

To analyze acute effects, compare the 'during treatment' segment vs the 'pre-treatment' segment:

Use interrupted time series analysis models to determine causality.
Apply statistical tests like t-tests to determine significance.
Systematically test different onset delays by shifting the 'during treatment' segment start time back incrementally. Account for unknown onset.
Systematically test excluding various amounts of time after treatment cessation to account for effect duration.
Look for acute improvements or decrements right after treatment begins based on the models.

¶ Cumulative Effects Analysis

To analyze cumulative effects, build regression models between the outcome variable and the cumulative treatment dosage over time:

Use linear regression, enforcing causality constraints.
Apply statistical tests like F-tests for significance.
Systematically test excluding various amounts of time after treatment cessation to account for effect duration.
Look for long-term improvements or decrements over time based on the regression models.

¶ Overall Effect Determination

Combine the acute and cumulative insights to determine the overall effect direction and statistical significance.

For example, acute worsening but long-term cumulative improvement would imply an initial side effect but long-term benefits. Lack of statistical significance would imply no effect.

¶ Optimization

To determine the optimal dosage, incrementally adjust the daily dosage amount in the models above. Determine the dosage that minimizes the outcome variable in both the acute and cumulative sense.

¶ Analysis Pipeline

Absolutely, given your constraints and requirements, here's a refined methodology:

Data Preprocessing:
- Handling Missingness: Exclude rows or time periods with missing data. This ensures the analysis is grounded in actual observations.
- Standardization: For treatments with larger scales, standardize values to have a mean of 0 and a standard deviation of 1. This will make regression coefficients more interpretable, representing changes in symptom severity per standard deviation change in treatment.
Lagged Regression Analysis:
- Evaluate if treatment on previous days affects today's outcome, given the discrete nature of treatment.
- Examine up to a certain number of lags (e.g., 30 days) to determine potential onset delay and duration.
- Coefficients represent the change in symptom severity due to a one unit or one standard deviation change in treatment, depending on whether standardization was applied. P-values indicate significance.
Reverse Causality Check:
- Assess if symptom severity on previous days predicts treatment intake. This helps in understanding potential feedback mechanisms.
Cross-Correlation Analysis:
- Analyze the correlation between treatment and symptom severity across various lags.
- This aids in understanding potential onset delays and durations of effect.
Granger Causality Tests:
- Test if past values of treatment provide information about future values of symptom severity and vice-versa.
- This test can help in determining the direction of influence.
Moving Window Analysis (for cumulative effects):
- Create aggregated variables representing the sum or average treatment intake over windows (e.g., 7 days, 14 days) leading up to each observation.
- Use these in regression models to assess if cumulative intake over time affects symptom severity.
Optimal Dosage Analysis:
- Group data by discrete dosage levels.
- Calculate the mean symptom severity for each group.
- The dosage associated with the lowest mean symptom severity suggests the optimal intake level.
Control for Confounders (if data is available):
- If data on potential confounding variables is available, incorporate them in the regression models. This helps in isolating the unique effect of the treatment.
Model Diagnostics:
- After regression, check residuals for normality, autocorrelation, and other potential issues to validate the model.
Interpretation:
- Consistency in findings across multiple analyses strengthens the case for a relationship.
- While no single test confirms causality, evidence from multiple methods can offer a compelling case.

By adhering to this methodology, you will be working with actual observations, minimizing the introduction of potential errors from imputation. The combination of lagged regression, Granger causality tests, and moving window analysis will provide insights into both acute and cumulative effects, onset delays, and optimal treatment dosages.

¶ Data Schema for Storing User Variable Relationship Analyses

Property	Type	Nullable	Description
`id`	int auto_increment	No	Unique identifier for each correlation entry.
`user_id`	bigint unsigned	No	ID of the user to whom this correlation data belongs.
`cause_variable_id`	int unsigned	No	ID of the variable considered as the cause in the correlation.
`effect_variable_id`	int unsigned	No	ID of the variable considered as the effect in the correlation.
`qm_score`	double	Yes	Quantitative metric scoring the importance of the correlation based on strength, usefulness, and causal plausibility.
`forward_pearson_correlation_coefficient`	float(10, 4)	Yes	Statistical measure indicating the linear relationship strength between cause and effect.
`value_predicting_high_outcome`	double	Yes	Specific cause variable value that predicts a higher than average effect.
`value_predicting_low_outcome`	double	Yes	Specific cause variable value that predicts a lower than average effect.
`predicts_high_effect_change`	int(5)	Yes	Percentage change in the effect when the predictor is near the value predicting high outcome.
`predicts_low_effect_change`	int(5)	No	Percentage change in the effect when the predictor is near the value predicting low outcome.
`average_effect`	double	No	Average value of the effect variable across all measurements.
`average_effect_following_high_cause`	double	No	Average value of the effect variable following high cause variable measurements.
`average_effect_following_low_cause`	double	No	Average value of the effect variable following low cause variable measurements.
`average_daily_low_cause`	double	No	Daily average of cause variable values that are below average.
`average_daily_high_cause`	double	No	Daily average of cause variable values that are above average.
`average_forward_pearson_correlation_over_onset_delays`	float	Yes	Average of Pearson correlation coefficients calculated over different onset delays.
`average_reverse_pearson_correlation_over_onset_delays`	float	Yes	Average of reverse Pearson correlation coefficients over different onset delays.
`cause_changes`	int	No	Count of changes in cause variable values across the dataset.
`cause_filling_value`	double	Yes	Default value used to fill gaps in cause variable data.
`cause_number_of_processed_daily_measurements`	int	No	Count of daily processed measurements for the cause variable.
`cause_number_of_raw_measurements`	int	No	Count of raw data measurements for the cause variable.
`cause_unit_id`	smallint unsigned	Yes	ID representing the unit of measurement for the cause variable.
`confidence_interval`	double	No	Statistical range indicating the reliability of the correlation effect size.
`critical_t_value`	double	No	Threshold value for statistical significance in correlation analysis.
`created_at`	timestamp	No	Timestamp of when the correlation record was created.
`data_source_name`	varchar(255)	Yes	Name of the data source for the correlation data.
`deleted_at`	timestamp	Yes	Timestamp of when the correlation record was marked as deleted.
`duration_of_action`	int	No	Duration in seconds for which the cause is expected to have an effect.
`effect_changes`	int	No	Count of changes in effect variable values across the dataset.
`effect_filling_value`	double	Yes	Default value used to fill gaps in effect variable data.
`effect_number_of_processed_daily_measurements`	int	No	Count of daily processed measurements for the effect variable.
`effect_number_of_raw_measurements`	int	No	Count of raw data measurements for the effect

¶ Conclusion

This rigorous methodology uses interrupted time series analysis, regression modeling, statistical testing, onset/duration modeling, and optimization to determine treatment effects from sparse, irregular observational data with missingness. It establishes causality and significance in both an acute and cumulative sense. By finding the optimal dosage, it provides actionable insights for maximizing the benefits of the treatment.

Resources

Relevant Research, Libraries and Code Examples
Example Current Study
TODO: Observable notebook
https://github.com/wishonia/FDAi/tree/develop/docs/components/data-analysis

Links

Mar 11 '24 22:03 mikepsinn

FDAi FDAi copied to clipboard

Black Box Predictive Control Model Causal Inference Library

Data Collection and Analysis

Adaptive Intervention and Predictive Control Models

Real-Life Application and Benefits

Test and Training Data

Format

Pre-Processing

Hyper-Parameters

Determining Treatment Effects from Sparse and Irregular Time Series Data

¶ Introduction

¶ Data Preprocessing

¶ Statistical Analysis Methodology

¶ Segment Data

¶ Acute Effects Analysis

¶ Cumulative Effects Analysis

¶ Overall Effect Determination

¶ Optimization

¶ Analysis Pipeline

¶ Data Schema for Storing User Variable Relationship Analyses

¶ Conclusion

Resources

Links

FDAi
FDAi copied to clipboard