PatientLevelPrediction
PatientLevelPrediction copied to clipboard
Inconsistent model covariates in covariate summary and saved model
Trained and saved a model using only age and sex. The plpResult$covariateSummary
suggests that there are eight covariates with their values as follows.
[1] -1.863384 -1.907642 0.000000 1.857917 1.387609 2.442050 0.000000 3.366165
However, the plpResult$model$model$coefficients
shows only seven covariates and the intercept.
(Intercept) 12003 13003 14003 15003 16003 8507001 8532001
-2.838907 0.000000 1.387609 1.857917 2.442050 3.366165 -1.907642 -1.863384
Looks like one covariate with value 0.0
is omitted here? This was tested with the latest database_results
branch of PatientLevelPrediction
.
Did the missing covariate have means and standard deviation values in the covariate summary object? What I suspect is happening is that a redundant age covariate is being dropped during the tidy covariates and then the model never see this. However, the covariate summary uses the pre-tidy data, so the variable is still in the covariate summary.
The thing that is bothering me a little with the results is why both male and female seem to be in the model (these are often the only options in many databases so one should be redundant) unless there are there other genders in the database being used?
Full covariateSummary
below. Not sure about the tidy covariates, because there is another value 0.0
, which is not dropped. Maybe we allocate eight places based on the number of covariates, but we incorrectly occupy one of these slots with the intercept and omit one covariate value?
covariateId covariateName analysisId conceptId CovariateCount CovariateMean CovariateStDev
<dbl> <chr> <dbl> <dbl> <int> <dbl> <dbl>
1 8532001 gender = FEMALE 1 8532 100991 0.569 0.495
2 8507001 gender = MALE 1 8507 76384 0.430 0.495
3 12003 age group: 60 - 64 3 0 35536 0.200 0.400
4 14003 age group: 70 - 74 3 0 33487 0.189 0.391
5 13003 age group: 65 - 69 3 0 29508 0.166 0.372
6 15003 age group: 75 - 79 3 0 27456 0.155 0.362
7 11003 age group: 55 - 59 3 0 37571 0.212 0.409
8 16003 age group: 80 - 84 3 0 13903 0.0783 0.269
WithNoOutcome_CovariateCount WithNoOutcome_CovariateMean WithNoOutcome_CovariateStDev WithOutcome_CovariateCount
<dbl> <dbl> <dbl> <dbl>
1 95876 0.569 0.495 5115
2 72644 0.431 0.495 3740
3 35224 0.209 0.407 312
4 31669 0.188 0.391 1818
5 28436 0.169 0.374 1072
6 24856 0.147 0.355 2600
7 37358 0.222 0.415 213
8 11055 0.0656 0.248 2848
WithOutcome_CovariateMean WithOutcome_CovariateStDev StandardizedMeanDiff TestWithNoOutcome_CovariateCount
<dbl> <dbl> <dbl> <dbl>
1 0.577 0.494 0.0171 68522
2 0.422 0.494 -0.0180 51846
3 0.0352 0.184 -0.550 25303
4 0.205 0.404 0.0435 22694
5 0.121 0.326 -0.136 20289
6 0.293 0.455 0.358 17700
7 0.0240 0.153 -0.631 26604
8 0.321 0.467 0.684 7837
TestWithNoOutcome_CovariateMean TestWithNoOutcome_CovariateStDev TrainWithNoOutcome_CovariateCount
<dbl> <dbl> <dbl>
1 0.569 0.495 27354
2 0.431 0.495 20798
3 0.210 0.407 9921
4 0.188 0.391 8975
5 0.168 0.374 8147
6 0.147 0.354 7156
7 0.221 0.415 10754
8 0.0651 0.247 3218
TrainWithNoOutcome_CovariateMean TrainWithNoOutcome_CovariateStDev TestWithOutcome_CovariateCount
<dbl> <dbl> <dbl>
1 0.568 0.495 3646
2 0.432 0.495 2676
3 0.206 0.404 220
4 0.186 0.389 1301
5 0.169 0.375 775
6 0.149 0.356 1864
7 0.223 0.416 157
8 0.0668 0.250 2013
TestWithOutcome_CovariateMean TestWithOutcome_CovariateStDev TrainWithOutcome_CovariateCount
<dbl> <dbl> <dbl>
1 0.576 0.494 1469
2 0.423 0.494 1064
3 0.0348 0.183 92
4 0.206 0.404 517
5 0.122 0.328 297
6 0.294 0.456 736
7 0.0248 0.156 56
8 0.318 0.466 835
TrainWithOutcome_CovariateMean TrainWithOutcome_CovariateStDev covariateValue
<dbl> <dbl> <dbl>
1 0.580 0.494 -1.86
2 0.420 0.494 -1.91
3 0.0363 0.187 0
4 0.204 0.403 1.86
5 0.117 0.322 1.39
6 0.291 0.454 2.44
7 0.0221 0.147 0
8 0.330 0.470 3.37