PatientLevelPrediction icon indicating copy to clipboard operation
PatientLevelPrediction copied to clipboard

Inconsistent model covariates in covariate summary and saved model

Open lhjohn opened this issue 2 years ago • 3 comments

Trained and saved a model using only age and sex. The plpResult$covariateSummary suggests that there are eight covariates with their values as follows.

[1] -1.863384 -1.907642  0.000000  1.857917  1.387609  2.442050  0.000000  3.366165

However, the plpResult$model$model$coefficients shows only seven covariates and the intercept.

(Intercept)       12003       13003       14003       15003       16003     8507001     8532001 
  -2.838907    0.000000    1.387609    1.857917    2.442050    3.366165   -1.907642   -1.863384

Looks like one covariate with value 0.0 is omitted here? This was tested with the latest database_results branch of PatientLevelPrediction.

lhjohn avatar Jul 22 '22 16:07 lhjohn

Did the missing covariate have means and standard deviation values in the covariate summary object? What I suspect is happening is that a redundant age covariate is being dropped during the tidy covariates and then the model never see this. However, the covariate summary uses the pre-tidy data, so the variable is still in the covariate summary.

jreps avatar Jul 22 '22 17:07 jreps

The thing that is bothering me a little with the results is why both male and female seem to be in the model (these are often the only options in many databases so one should be redundant) unless there are there other genders in the database being used?

jreps avatar Jul 22 '22 17:07 jreps

Full covariateSummary below. Not sure about the tidy covariates, because there is another value 0.0, which is not dropped. Maybe we allocate eight places based on the number of covariates, but we incorrectly occupy one of these slots with the intercept and omit one covariate value?

  covariateId covariateName        analysisId conceptId CovariateCount CovariateMean CovariateStDev
        <dbl> <chr>                     <dbl>     <dbl>          <int>         <dbl>          <dbl>
1     8532001 gender = FEMALE               1      8532         100991        0.569           0.495
2     8507001 gender = MALE                 1      8507          76384        0.430           0.495
3       12003 age group:  60 -  64          3         0          35536        0.200           0.400
4       14003 age group:  70 -  74          3         0          33487        0.189           0.391
5       13003 age group:  65 -  69          3         0          29508        0.166           0.372
6       15003 age group:  75 -  79          3         0          27456        0.155           0.362
7       11003 age group:  55 -  59          3         0          37571        0.212           0.409
8       16003 age group:  80 -  84          3         0          13903        0.0783          0.269
  WithNoOutcome_CovariateCount WithNoOutcome_CovariateMean WithNoOutcome_CovariateStDev WithOutcome_CovariateCount
                         <dbl>                       <dbl>                        <dbl>                      <dbl>
1                        95876                      0.569                         0.495                       5115
2                        72644                      0.431                         0.495                       3740
3                        35224                      0.209                         0.407                        312
4                        31669                      0.188                         0.391                       1818
5                        28436                      0.169                         0.374                       1072
6                        24856                      0.147                         0.355                       2600
7                        37358                      0.222                         0.415                        213
8                        11055                      0.0656                        0.248                       2848
  WithOutcome_CovariateMean WithOutcome_CovariateStDev StandardizedMeanDiff TestWithNoOutcome_CovariateCount
                      <dbl>                      <dbl>                <dbl>                            <dbl>
1                    0.577                       0.494               0.0171                            68522
2                    0.422                       0.494              -0.0180                            51846
3                    0.0352                      0.184              -0.550                             25303
4                    0.205                       0.404               0.0435                            22694
5                    0.121                       0.326              -0.136                             20289
6                    0.293                       0.455               0.358                             17700
7                    0.0240                      0.153              -0.631                             26604
8                    0.321                       0.467               0.684                              7837
  TestWithNoOutcome_CovariateMean TestWithNoOutcome_CovariateStDev TrainWithNoOutcome_CovariateCount
                            <dbl>                            <dbl>                             <dbl>
1                          0.569                             0.495                             27354
2                          0.431                             0.495                             20798
3                          0.210                             0.407                              9921
4                          0.188                             0.391                              8975
5                          0.168                             0.374                              8147
6                          0.147                             0.354                              7156
7                          0.221                             0.415                             10754
8                          0.0651                            0.247                              3218
  TrainWithNoOutcome_CovariateMean TrainWithNoOutcome_CovariateStDev TestWithOutcome_CovariateCount
                             <dbl>                             <dbl>                          <dbl>
1                           0.568                              0.495                           3646
2                           0.432                              0.495                           2676
3                           0.206                              0.404                            220
4                           0.186                              0.389                           1301
5                           0.169                              0.375                            775
6                           0.149                              0.356                           1864
7                           0.223                              0.416                            157
8                           0.0668                             0.250                           2013
  TestWithOutcome_CovariateMean TestWithOutcome_CovariateStDev TrainWithOutcome_CovariateCount
                          <dbl>                          <dbl>                           <dbl>
1                        0.576                           0.494                            1469
2                        0.423                           0.494                            1064
3                        0.0348                          0.183                              92
4                        0.206                           0.404                             517
5                        0.122                           0.328                             297
6                        0.294                           0.456                             736
7                        0.0248                          0.156                              56
8                        0.318                           0.466                             835
  TrainWithOutcome_CovariateMean TrainWithOutcome_CovariateStDev covariateValue
                           <dbl>                           <dbl>          <dbl>
1                         0.580                            0.494          -1.86
2                         0.420                            0.494          -1.91
3                         0.0363                           0.187           0   
4                         0.204                            0.403           1.86
5                         0.117                            0.322           1.39
6                         0.291                            0.454           2.44
7                         0.0221                           0.147           0   
8                         0.330                            0.470           3.37

lhjohn avatar Jul 22 '22 18:07 lhjohn