chainladder-python
chainladder-python copied to clipboard
BUG in ChainLadder grain OYDQ
import chainladder as cl
import pandas as pd
df = pd.DataFrame({
"claim_year": 2000 + pd.Series([0] * 8 + [1] * 4),
"claim_month": [1, 4, 7, 10] * 3,
"dev_year": 2000 + pd.Series([0] * 4 + [1] * 8),
"dev_month": [1, 4, 7, 10] * 3,
"payment": [1] * 12,
})
tr = cl.Triangle(
df,
origin=["claim_year", "claim_month"],
development=["dev_year", "dev_month"],
columns="payment",
cumulative=False,
).grain("OYDQ")
cl_est = cl.Chainladder().fit(cl.Development(average="volume").fit_transform(tr))
cl_est.ultimate_
cl_est.full_triangle_
results in
| 3 | 6 | 9 | 12 | 15 | 18 | 21 | 24 | 27 | 30 | 33 | 36 | 9999 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | |||||
| 2001 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 7.0000 | 7.0000 | 7.0000 | 7.0000 |
Observe the predicted 7 in dev period 15, 28, 21 and 24 for OY 2001. This is pretty odd!!!
The triangle without estimation reads
| 3 | 6 | 9 | 12 | 15 | 18 | 21 | 24 | 27 | 30 | 33 | 36 | 9999 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | |||||
| 2001 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
When the grain is changed to same O and D, it seems to work fine. But with different origin and development steps not.
BTW, there is an additional warning
RuntimeWarning: Mean of empty slice
xp.nansum(w * x * y, axis) - xp.nansum(x * w, axis) * xp.nanmean(y, axis)
Thanks for finding this, it is definitely an issue.
@jbogaardt Do you have a rough idea where this bug origins? What I could contribute, if it helps, is to add the above snippet as a test.
I've been trying to use this to get more understanding of how Chainladder package work.
Of what I can see it seem like the issue is that the function call latest_diagonal.val_to_dev() don't get the same dimensions as the original triangle, but starts from period 12.
cl.Chainladder().fit(cl.Development(average="volume").fit_transform(tr)).latest_diagonal.val_to_dev()
| 12 | 15 | 18 | 21 | 24 | |
|---|---|---|---|---|---|
| 2000-01-01 | nan | nan | nan | nan | 8 |
| 2001-01-01 | 4 | nan | nan | nan | nan |
This when _align_cdf is called it then uses this shape, and gets the fifth element for 2000 which is 1.6 and 1 for 2001 which is 8.
The easy fix seems to use incr_to_cum() instead of latest_diagonal when using incremental. That is change get_ultimate to:
def _get_ultimate(self, X, sample_weight=None):
""" Private method that uses CDFs to obtain an ultimate vector """
if X.is_cumulative == False:
ld = X.incr_to_cum().latest_diagonal #ld = X.sum('development')
ultimate = X.incr_to_cum().copy() #ultimate = ld.val_to_dev()
else:
ld = X.latest_diagonal
ultimate = X.copy()
cdf = self._align_cdf(ultimate, sample_weight)
ultimate = ld * cdf
return self._set_ult_attr(ultimate)
This gives same output as creating the cumulative triangle first, and tests are passing. But I'm not sure the side effects of this change. And if the issue actually is val_to_dev, than this maybe is just hiding something that should have been taken care of.
I've tried to take a deeper look at val_to_dev, but not skilled enough yet. @jbogaardt - Any idea how to fix this in an efficient and simple way? 😄
FYI - same error occurs when using Benktander:
dev = cl.Development(average="volume").fit_transform(tr)
cl.Benktander(apriori=1, n_iters=10000).fit(dev, sample_weight =dev.latest_diagonal).full_triangle_
Seems like changing align_cdf_ passes all tests, by using something with original form, for instance
cdf = X.cdf_.iloc[..., : self.X_.shape[-1]]. But again not sure of the consequences.
Just another thought - may it be better to aggregate everything to cumulative at initiation? Instead of testing in the different models? Then one probably only have to do conversion back when using IO methods like to_frame/to_json/to_pickle?
not reopening due to age. implemented fix from @johalnes to chainladder and benktander. also added tests