chainladder-python
chainladder-python copied to clipboard
Expected Loss Method bug
using the example provided in https://chainladder-python.readthedocs.io/en/latest/tutorials/deterministic-tutorial.html#expected-loss-method
EL_model.full_triangle_ shows negatives in the next diagonal. The bottom triangle should be filled in by allocating the selected IBNR using conditional distributions from the selected dev pattern, rather than doing implied LDF calcs.
Might also be an issue with Benktander. Will investigate and report back.
confirmed this also affects Benktander. Using the example from the tutorial, set n_iters = 0. the next diagonal goes weird.
quick clarification, i'm doing .full_triangle_.cum_to_incr(). The next incremental diagonal is acting weird.
Hello @henrydingliu , It may be that you are looking for the lower half of the full_expectation_
triangle? For the Chainladder
method both full_expectation_
and full_triangle_
produce identical results. However these properties differ for the expected loss methods, (e.g. BF, EL, BK). We changed how full_triangle_
works because of #143. I believe this approach is causing the "weird" results you are seeing. Even though #143 was adopted, this is an area where there is very little actuarial literature, so I'm very much open to contrasting opinions on the underlying actuarial theory. Let me know your thoughts.
allocating IBNR using conditional dev distribution will 1. coincidence with standard LDF based approach on the standard chainladder; 2. smooth out the bottom triangle for any special cases of BK.
it won't fix blowing up certain incrementals when there is negative development. i haven't come across any method that would.
quick example of conditional dev distribution. say selected CDF is 3, 2, 1,6, 1.4, 1.3 etc. the uncondtitional dev distribution is .33, .17, .125, .089, etc. conditional dev distribution at age 1 is .25, .15, .10, etc.
when's the cutoff for 0.8,14 @jbogaardt
here is my proposed alternative. basically using selected pattern (not implied) to distribute IBNR.
def new_runoff(dev,ultimate):
emergence = dev.ldf_.copy() * (ultimate/ultimate)
emergence = (
emergence[emergence.valuation < dev.valuation_date] * 0
+ 1
+ emergence[emergence.valuation >= dev.valuation_date]
)
emergence.valuation_date = pd.to_datetime(cl.options.ULT_VAL)
emergence.values = emergence.values.cumprod(axis=3) - 1
emergence.ddims = emergence.ddims + [12]
emergence.ddims[-1] = 9999
emergence.values = emergence.values / emergence.values[...,-1][...,None]
ld = dev.incr_to_cum().latest_diagonal
a = emergence * (ultimate - ld)
cum_run_off = ld + a
cum_run_off = cum_run_off[cum_run_off.valuation > dev.valuation_date]
return dev + cum_run_off
it mirrors current behaviour for CL and BF.
#setting up example
genins = cl.load_sample("genins")
genins_dev = cl.Pipeline(
[("dev", cl.Development()), ("tail", cl.TailCurve())]
).fit_transform(genins)
genins_model = cl.Chainladder().fit(genins_dev)
expected_loss_apriori = genins_model.ultimate_ * 0 + genins_model.ultimate_.mean()
EL_model = cl.ExpectedLoss(apriori=0.95).fit(
genins_dev, sample_weight=expected_loss_apriori
)
BF_model = cl.BornhuetterFerguson(apriori=0.95).fit(
genins_dev, sample_weight=expected_loss_apriori
)
print(np.all(new_runoff(genins_dev,genins_model.ultimate_).values.round(3) == genins_model.full_triangle_.values.round(3)))
print(np.all(new_runoff(genins_dev,BF_model.ultimate_).values.round(3) == BF_model.full_triangle_.values.round(3)))
I believe it is a Pareto improvement on EL. On Bk I think it's not theoretically inferior.
No formal date on v0.8.14, but we do have a lot of unreleased fixes now, its worth tagging the ones we want for inclusion and prioritize concluding them.
Your new_runoff
method has a lot good properties.
- As you show, its identical to the current implementation for BornhuetterFerguson, CapeCod, and Chainladder
- For Benktander with
n_iters=1000
it converges to Chainladder as expected - Its more concise than the current implementation
Still trying to get my head around why it is a pareto improvement, but you've given it some thought and your other ideas have been great. I'll tinker with it today to gain a better understanding.
im probably butchering "Pareto".
In my tinkering, I've found that I consistently prefer new_runoff for EL. The current implementation can introduce big jumps in the next diagonal, whereas new_runoff is smoother.
well, this is embarrassing. turns out my code had a typo when I was testing new_runoff on Bk. new_runoff actually mirrors current implementation for Bk as well
genins = cl.load_sample("genins")
genins_dev = cl.Pipeline(
[("dev", cl.Development()), ("tail", cl.TailCurve())]
).fit_transform(genins)
genins_model = cl.Chainladder().fit(genins_dev)
expected_loss_apriori = genins_model.ultimate_ * 0 + genins_model.ultimate_.mean()
Bk_model = cl.Benktander(apriori=0.95).fit(
genins_dev, sample_weight=expected_loss_apriori
)
print(np.all(new_runoff(genins_dev,Bk_model.ultimate_).values.round(3) == Bk_model.full_triangle_.values.round(3)))
I think with extreme apriori selections, it is not quite the same as the current implementation, but they seem to be pretty close with the exception of the ExpectedLoss
method.
import chainladder as cl
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'svg'
def new_runoff(dev,ultimate):
emergence = dev.ldf_.copy() * (ultimate/ultimate)
emergence = (
emergence[emergence.valuation < dev.valuation_date] * 0
+ 1
+ emergence[emergence.valuation >= dev.valuation_date]
)
emergence.valuation_date = pd.to_datetime(cl.options.ULT_VAL)
emergence.values = emergence.values.cumprod(axis=3) - 1
emergence.ddims = emergence.ddims + [12]
emergence.ddims[-1] = 9999
emergence.values = emergence.values / emergence.values[...,-1][...,None]
ld = dev.incr_to_cum().latest_diagonal
a = emergence * (ultimate - ld)
cum_run_off = ld + a
cum_run_off = cum_run_off[cum_run_off.valuation > dev.valuation_date]
return dev + cum_run_off
fig, ((ax00, ax01, ax02),
(ax10, ax11, ax12)) = plt.subplots(ncols=3, nrows=2, figsize=(15,10))
np.random.seed(100)
triangle = cl.load_sample('raa')
dev = cl.TailCurve().fit_transform(cl.Development().fit_transform(triangle))
exposure = triangle.latest_diagonal*0 + np.random.normal(20000, 10000, size=(10,1))
for iteration, ax in enumerate([ax00, ax01, ax02, ax10, ax11, ax12]):
iteration = 1000 if iteration == 5 else iteration
bk0_model = cl.Benktander(apriori=0.65, n_iters=iteration).fit(dev, sample_weight=exposure)
old_pct_ult = bk0_model.full_triangle_ / bk0_model.ultimate_
old_pct_ult = old_pct_ult[old_pct_ult.valuation>=triangle.valuation_date]
new_pct_ult = (new_runoff(dev, bk0_model.ultimate_) / bk0_model.ultimate_)
new_pct_ult = new_pct_ult[new_pct_ult.valuation>=triangle.valuation_date]
(old_pct_ult-new_pct_ult).T.set_index(old_pct_ult.ddims.astype(str)).plot(
ax=ax, title=f'n_iters={iteration}', ylabel='% of Ultimate Difference', legend=False,
xlabel='Age', sharex=True, grid=True, ylim=(-.25, .25))
I'm surprised that your approach doesn't actually rely on n_iters
of the Benktander
method. That is the parameter that decides how to blend apriori
and latest_diagonal
when computing ultimate, but it doesn't matter for any other development age. Perhaps because its encoded in the ultimate? That said, I do agree with you. I like how your approach works for the ExpectedLoss
method and it is nearly identical for all others. Absent a good paper to referee the decision, I'm inclined to go with your suggestion.
Your suggestion has been adopted in the master branch.
ah, okay. n_iters default to 1. So my earlier comment was still just comparing the runoffs for BF.
slight tweak to your example to illustrate the divergence at extreme aprioris.
import chainladder as cl
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'svg'
def new_runoff(dev,ultimate):
emergence = dev.ldf_.copy() * (ultimate/ultimate)
emergence = (
emergence[emergence.valuation < dev.valuation_date] * 0
+ 1
+ emergence[emergence.valuation >= dev.valuation_date]
)
emergence.valuation_date = pd.to_datetime(cl.options.ULT_VAL)
emergence.values = emergence.values.cumprod(axis=3) - 1
emergence.ddims = emergence.ddims + [12]
emergence.ddims[-1] = 9999
emergence.values = emergence.values / emergence.values[...,-1][...,None]
ld = dev.incr_to_cum().latest_diagonal
a = emergence * (ultimate - ld)
cum_run_off = ld + a
cum_run_off = cum_run_off[cum_run_off.valuation > dev.valuation_date]
return dev + cum_run_off
fig, ((ax00, ax01, ax02),
(ax10, ax11, ax12)) = plt.subplots(ncols=3, nrows=2, figsize=(15,10))
np.random.seed(100)
triangle = cl.load_sample('raa')
dev = cl.TailCurve().fit_transform(cl.Development().fit_transform(triangle))
exposure = triangle.latest_diagonal*0 + np.random.normal(20000, 10000, size=(10,1))
for LCM, ax in enumerate([ax00, ax01, ax02, ax10, ax11, ax12]):
bk0_model = cl.Benktander(apriori=0.65 * LCM, n_iters=2).fit(dev, sample_weight=exposure)
old_pct_ult = bk0_model.full_triangle_.cum_to_incr() / bk0_model.ibnr_
old_pct_ult = old_pct_ult[old_pct_ult.valuation>=triangle.valuation_date]
new_pct_ult = (new_runoff(dev, bk0_model.ultimate_).cum_to_incr() / bk0_model.ibnr_)
new_pct_ult = new_pct_ult[new_pct_ult.valuation>=triangle.valuation_date]
(old_pct_ult-new_pct_ult).T.set_index(old_pct_ult.ddims.astype(str)).plot(
ax=ax, title=f'apriori={0.65*LCM}', ylabel='% of Ultimate Difference', legend=False,
xlabel='Age', sharex=True, grid=True, ylim=(-.25, .25))
plt.show()
at very low apriori, previous implementation had more weight in near future diagonals. At very high apriori, previous implementation had less weight in near future diagonals.