oref0 icon indicating copy to clipboard operation
oref0 copied to clipboard

Clinical Safety Audit Report

Open Aillian opened this issue 1 month ago • 2 comments

Clinical Safety Audit Report: oref0

Date: 2025-12-04 Target: oref0 (determine-basal, iob, meal, profile)

Executive Summary

This audit analyzed the oref0 codebase for clinical hazards and medical logic errors. The system generally exhibits robust safety mechanisms, particularly in noise filtering and maximum dosing caps. However, a Critical Vulnerability was identified in the parameter validation logic for Carb Ratios, where a coding error renders the safety check ineffective, potentially allowing fatal overdose configurations.

Additionally, the "Low Glucose Suspend" logic relies on a calculated threshold that may be too low for users with already low targets, and the "Curve Mismatch" risk is present due to hardcoded constraints on insulin action curves.

Key Findings Severity Matrix

Finding Severity Category Description
Broken Carb Ratio Validation CRITICAL Parameter Corruption Validation checks the object, not the value, allowing impossible ratios (e.g., 0.1g/U).
Suspension Threshold Latency HIGH Hypoglycemia Prevention Suspension trigger lowers as target lowers, potentially delaying action until hypoglycemia is active.
Recovery Aggression MEDIUM Hypoglycemia Prevention Potential for rebound lows if system resumes basal too early based on noisy "rising" data.
Curve Mismatch MEDIUM IOB Accuracy Hardcoded DIA/Peak clamps may not match all physiologies, leading to IOB estimation errors.

1. Hypoglycemia Prevention Logic

Suspension Latency (The "Low" Safety Net)

Status: Vulnerable

The logic in determine-basal.js calculates a suspension threshold as: threshold = min_bg - 0.5 * (min_bg - 40)

  • Clinical Risk: This formula effectively lowers the safety net below the user's target.
    • If min_bg is 100 mg/dL, threshold is 70 mg/dL.
    • If min_bg is 80 mg/dL, threshold is 60 mg/dL.
  • Hazard: For a user targeting 80 mg/dL, the system may not suspend insulin until the predicted BG hits 60 mg/dL. Given sensor lag (15-20 mins), the user could be physically hypoglycemic (e.g., 50 mg/dL) before the sensor reads 60 mg/dL and triggers the suspension.
  • Recommendation: The threshold should have a hard floor (e.g., 70 mg/dL) or not drop significantly below the bottom of the target range.

Recovery Aggression

Status: Caution

The system allows resuming basal (canceling zero temp) if minDelta > expectedDelta (BG is rising faster than expected), even if eventualBG is still below target (lines 930-950 of determine-basal.js).

  • Clinical Risk: If a "compression low" releases or sensor noise creates a false rise, the system might interpret this as a recovery and resume insulin too early.
  • Mitigation: The code does check glucose_status.noise, but a clean "false rise" could still trigger this.

2. Insulin Stacking & "IOB" Accuracy

The Stacking Hazard

Status: Safe

lib/iob/total.js correctly iterates through all treatments (boluses and temp basals) and sums their IOB contributions. There is no evidence of "forgotten" insulin in the summation logic.

Curve Mismatch

Status: Medium Risk

lib/iob/calculate.js enforces specific shapes for insulin curves:

  • Bilinear: Hardcoded 3-hour duration assumption (scaled).

  • Exponential: Clamps peak times (e.g., Rapid-acting clamped to 50-120 mins).

  • Clinical Risk:

    • Gastroparesis: Users with slow absorption might have insulin active for 6+ hours. The system forces a max DIA of 5 hours in some checks or scales curves based on a 3-hour model.
    • Fiasp/Lyumjev: Users with ultra-fast absorption might peak at 30 mins. The code clamps ultra-rapid peak to a minimum of 35 mins.
    • Consequence: Mismatch leads to incorrect IOB calculation. If IOB is underestimated (tail cut off), the system may stack insulin, leading to late-onset hypoglycemia.

3. Hyperglycemia & "Unannounced Meal" (UAM) Risks

False Positives

Status: Robust

The system includes strong protections against noise:

  • determine-basal.js disables SMB if glucose_status.noise >= 3.
  • It also disables SMB if maxDelta > 0.2 * bg (20% jump), which effectively filters out sensor jumps that look like meals.

Max Dosing Cap

Status: Safe

The UAM logic respects max_iob and maxSMBBasalMinutes.

  • It calculates microBolus as 50% of the requirement.
  • It caps the bolus size to 30 mins of basal (configurable).
  • This conservative approach minimizes the risk of a massive overdose from a false positive UAM detection.

4. Physiological Feasibility Checks

Impossible Data

Status: Safe Checks for bg <= 10 and bg === 38 (sensor error) are present and correctly inhibit dosing.

Sensitivity Scaling (Autosens)

Status: Safe autosens.js strictly clamps the sensitivity ratio between autosens_min (default 0.7) and autosens_max (default 1.2). This prevents the system from assuming the user is wildly sensitive or resistant, which limits the magnitude of dosing errors.

5. Parameter Corruption (The "Fat Finger" Effect)

Critical Vulnerability: Carb Ratio Validation

Status: CRITICAL

In lib/profile/carbs.js, the validation logic is flawed:

// lib/profile/carbs.js lines 18-21
carbRatio = carbratio_data.schedule[i]; // carbRatio is an OBJECT
if (carbRatio < 3 || carbRatio > 150) { // Comparing OBJECT to Number
    console_error(..., "Error: carbRatio of " + carbRatio + " out of bounds.");
    return;
}
  • The Bug: carbRatio is an object (e.g., { start: "00:00", ratio: 10 }). In JavaScript, comparing an object to a number usually results in false (or unexpected coercion).
  • The Consequence: The check carbRatio < 3 will likely FAIL TO TRIGGER even if the user enters a ratio of 0.1.
  • Clinical Scenario: A user accidentally enters a Carb Ratio of 1 (1g/unit) instead of 10 (10g/unit).
    • The validation passes (due to the bug).
    • The system calculates insulin for a 50g meal: 50 / 1 = 50 units.
    • Result: A 10x overdose, which is likely fatal.

Recommendation: Change the line to check the value property: if (carbRatio.ratio < 3 || carbRatio.ratio > 150)

6. Algorithmic Integrity Analysis (New)

Floating Point Precision

Status: Low Risk The codebase uses Math.round() with scaling factors (e.g., in lib/round-basal.js) to handle pump resolution (0.05U). While JavaScript floating point math (IEEE 754) has inherent precision issues (e.g., 0.1 + 0.2 !== 0.3), the explicit rounding strategies employed here are sufficient for clinical insulin delivery, where precision beyond 0.025U is rarely actionable.

Timezone & DST Handling

Status: Safe lib/iob/history.js and other modules extensively use moment-timezone to normalize timestamps. This mitigates the risk of "time travel" bugs during Daylight Saving Time transitions or when the rig's time differs from the pump's time.

Unit Conversion Consistency (ISF Mismatch)

Status: CRITICAL

In lib/profile/index.js, the system checks if profile.sens (ISF) is < 5.

if (profile.sens < 5) { ... return -1; }
  • The Flaw: This check assumes that any value < 5 is an error (or mmol/L that should be rejected). However, it ACCEPTS any value >= 5.
  • Clinical Scenario: A user enters an ISF of 10 mmol/L (which is 180 mg/dL/U, a typical value for a sensitive child).
    • 10 >= 5, so the check passes.
    • The system treats this as 10 mg/dL/U (Extremely Resistant).
    • Consequence: The system calculates that the user needs 18x more insulin than they actually do to correct a high.
    • Result: Massive overdose.
  • Recommendation: Explicitly require a units field for ISF or raise the minimum mg/dL floor to ~20 (since 20 mg/dL/U is extremely resistant, and 1.1 mmol/L is extremely resistant, overlap is minimal but 5 is in the danger zone).

Conclusion

The oref0 system is generally well-designed with safety in mind, but the Carb Ratio validation bug and ISF Unit Conversion flaw are critical oversights that require immediate remediation. The Hypoglycemia suspension threshold also warrants review to ensure it provides an adequate safety margin for users with lower targets.

Aillian avatar Dec 03 '25 22:12 Aillian

Can you share which LLM model, prompt, and context (repositories, etc.) were used to generate this report? It's missing some context that likely mitigates these concerns in practice, and shows some definite misunderstandings in other areas.

scottleibrand avatar Dec 04 '25 02:12 scottleibrand

I used Gemini 3 Pro from Antigravity to generate the report, I gave it access to the whole codebase. The following is the prompt:

You are a specialized Clinical Software Safety Auditor and Diabetes Physiologist. Your expertise lies in the intersection of software logic and human metabolism. You do not just look for "bugs" in the code; you look for medical errors that could lead to physiological harm.

Target: The oref0 (OpenAPS Reference Design) repository: @oref0 , specifically the determine-basal and iob (Insulin On Board) calculation libraries.

Mission: Analyze the codebase to identify potential clinical hazards and medical logic errors. I need you to simulate how the code interacts with a human body and determine if the algorithm respects physiological constraints.

Areas of Clinical Analysis:

1. Hypoglycemia Prevention Logic (The "Low" Safety Net)

Suspension Latency: Analyze the logic that triggers a "Zero Temp" basal. Does the system react immediately when a predicted low is detected, or is there a dangerous delay caused by data smoothing or averaging?

Recovery Aggression: When the user is recovering from a low (blood glucose rising from 60 to 80), does the system resume insulin too quickly? Check for logic that might cause a "rebound low."

2. Insulin Stacking & "IOB" Accuracy

The Stacking Hazard: Investigate how the system calculates "Insulin On Board" (IOB). Does it accurately account for all insulin (boluses + temp basals)?

Curve mismatch: If the user's actual insulin duration is 5 hours, but the code defaults to a 3-hour curve, what is the medical risk? Look for code paths where the "tail" of the insulin action might be ignored, leading to an overdose (stacking).

3. Hyperglycemia & "Unannounced Meal" (UAM) Risks

False Positives: The "UAM" feature detects a rise in glucose and assumes it is a meal. Analyze the risk of the system mistaking Sensor Noise (a false jump in CGM data) or Compression Low recovery for a meal. If it doses insulin for "noise," it could be fatal.

Max Dosing Cap: Does the algorithm respect the maximum clinical bolus limit in all logical branches, or can a "UAM" calculation theoretically bypass standard safety caps?

4. Physiological Feasibility Checks

Impossible Data: Does the code accept physiologically impossible changes? (e.g., Blood glucose dropping by 100 mg/dL in 5 minutes). If the sensor sends this data, does the algorithm act on it, or does it recognize it as a sensor error?

Sensitivity Scaling (Autosens): Review the code that adjusts sensitivity (ISF). Is there a "floor" or "ceiling" to how much it can change? (e.g., Can it accidentally decide the user is 5x more resistant than normal and dump 5x the insulin?)

5. Parameter Corruption

The "Fat Finger" Effect: If a user accidentally enters a Carb Ratio of "1" instead of "10", does the math result in a 10x overdose, or are there sanity checks that flag this as medically unlikely?

Output Instructions: process your findings and generate a comprehensive, detailed medical safety report.

You have full autonomy on how to structure the report, but it must be logical, easy to read for a clinician, and rigorous.

You MUST include a distinct Executive Summary at the beginning or end of the report.

Include the following in the report:
Logical Flaws: (Reasoning errors).
Math/Precision Risks: (Calculation errors).
Edge Case Vulnerabilities: (Rare events that break the code).
Code Quality/Brittleness: (Poorly written code that is hard to maintain).

Aillian avatar Dec 10 '25 18:12 Aillian