BayesianDataFusion.jl icon indicating copy to clipboard operation
BayesianDataFusion.jl copied to clipboard

Computing out-of-sample stdev

Open suchow opened this issue 4 years ago • 3 comments

Was #6 closed because it was implemented?

I see that predictions on the test set include both the predicted value pred and the standard deviation stdev.

julia> result["predictions"]
500000×5 DataFrames.DataFrame
│ Row    │ E1   │ E2   │ values │ pred    │ stdev     │
├────────┼──────┼──────┼────────┼─────────┼───────────┤
│ 1      │ 5121 │ 1923 │ 4.0    │ 4.17571 │ 0.106283  │
│ 2      │ 481  │ 3528 │ 5.0    │ 3.80128 │ 0.309201  │
│ 3      │ 1279 │ 3175 │ 4.0    │ 2.9776  │ 0.237935  │
│ 4      │ 5364 │ 1172 │ 5.0    │ 4.29892 │ 0.143759  │
│ 5      │ 424  │ 1356 │ 4.0    │ 3.91691 │ 0.103391  │
│ 6      │ 258  │ 457  │ 5.0    │ 4.4669  │ 0.0985462 │
│ 7      │ 1978 │ 2555 │ 1.0    │ 1.36788 │ 0.290181  │
│ 8      │ 1150 │ 193  │ 1.0    │ 1.55493 │ 0.160465  │
│ 9      │ 2279 │ 1097 │ 5.0    │ 4.00714 │ 0.184425  │
⋮

However, the full predictions include only the predicted values themselves, and not the stdev.

julia> result["predictions_full"]
6040x3952 Array{Float64,2}:
 4.39071  3.91885  3.64339  3.41556  3.72048  3.98654  …

I'm looking to assess prediction uncertainty for out-of-sample entity pairs.

My current understanding of how to implement this is to modify macau.jl so that in addition to storing the sum over sampled predictions during sampling, like it does on line 146 (https://github.com/jaak-s/BayesianDataFusion.jl/blob/master/src/macau.jl#L146), it would additionally store what's needed to compute variance (basically the sum of squares, though some adjustments are needed for numerical precision https://dl.acm.org/doi/10.1145/3221269.3223036).

Is this something you be interested in including, unless it's already available?

suchow avatar Jun 09 '20 21:06 suchow

Yes, we would be interested to include that functionality. For example, we could add another field called result["predictions_full_stdev"] to store the matrix (tensor in the general case) of standard deviations.

Are you interested to make a pull request?

jaak-s avatar Jun 10 '20 08:06 jaak-s

I'll give it a try! I'm new to Julia, so this may take a while.

suchow avatar Jun 29 '20 15:06 suchow

Cool! Let me know if you need any help.

jaak-s avatar Jun 30 '20 10:06 jaak-s