chainladder-python icon indicating copy to clipboard operation
chainladder-python copied to clipboard

BarnettZehnwirth fails if fiting on C(valuation) as valuation has a different dimension from origin and devlopment

Open cf4869 opened this issue 2 years ago • 5 comments

abc = cl.load_sample('abc') len(abc.origin) len(abc.development) len(abc.valuation) model = cl.BarnettZehnwirth(formula='C(origin)+C(development)+C(valuation)').fit(abc)

cf4869 avatar Apr 07 '22 01:04 cf4869

Hi @cf4869 , this is related to #230 and #231.

When using discrete valuations C(valuation), the regression doesn't quite know how to handle valuation periods beyond those present in the triangle. Because there is no data for future diagonals, there are no coefficients, and the estimator bugs out when it tries to predict out the lower half of the triangle. This is why using valuation as a continuous/ordinal feature works:

model = cl.BarnettZehnwirth(formula='C(origin)+C(development)+valuation').fit(abc)

Here the regression knows to extrapolate beyond the end of the triangle since it sees valuation as an ordinal variable.

I would like to see BarnettZehnwirth support a future trend assumption as a user-supplied parameter so that your example works. We haven't quite figured out a clean/flexible way to expand this estimator to that yet., but its on the list of todos.

jbogaardt avatar Apr 07 '22 02:04 jbogaardt

Hi @jbogaardt, thanks for the answer. so in that case, the signal should be picked up by using valuation as one of the parameters, but we still have non-random residual on the evaluation date graph. model = cl.BarnettZehnwirth(formula='C(origin)+C(development)+ valuation').fit(abc) image

if we fit only ordinal variables, we have non-random residuals on all of them. Does this makes sense? model = cl.BarnettZehnwirth(formula='origin+development+valuation').fit(abc) image

cf4869 avatar Apr 07 '22 17:04 cf4869

I'll have to dust off the paper - its been a while and my understanding is a little hazy.

Fitting features as ordinal/continuous and not strictly categorical, you get a single regression coefficient for that axis.

For example, this model has three coefficents (plus intercept):

import chainladder as cl
abc = cl.load_sample('abc')
cl.BarnettZehnwirth(formula='origin+development+valuation').fit(abc).coef_

A single origin coefficient as in this model assumes linearity in the trend along the origin dimension. But if there is non-linearity in the underlying data, you would see non-random residuals. This suggests that you should probably break origin up into separate coefficients.

Hypothetically you could choose one coefficient for the 3 oldest origin years, another coefficient for origins 4 and 5 and a final coefficient for origin years 6 and later. This is how this would look:

cl.BarnettZehnwirth(
   formula='C(np.where(origin<=2, 0, np.where(origin<5,1,2)))+development+valuation'
).fit(abc).coef_

Actually, getting back to your original issue, you could create a model that uses discrete valuations and just extrapolates future valuations from the last available:

cl.BarnettZehnwirth(
   formula='C(origin)+C(development)+C(np.minimum(valuation, 9))'
).fit(abc).coef_

The point I am trying to make in all this is that the residual analysis gives you insight into how you should structure your formula, its not guaranteed to be random for any particular formula.

jbogaardt avatar Apr 07 '22 18:04 jbogaardt

Hi @jbogaardt, thanks for clarifying; it appears that using discrete valuations completely absorbs the signal, resulting in random residuals in all directions. model = cl.BarnettZehnwirth(formula='C(origin)+C(development)+C(np.minimum(valuation, 9))').fit(abc) image

However, because of the multicollinearity, the model may be overparameterized in this case. As a result, fitting on fewer parameters by combining a few levels could be a viable option. So, aside from grouping levels, is it possible to set a few parameters in the fitted model? For example, if we know the trend is 2% prior to 1979, and 5% after for origin, how do we feed this information into the model so that we only need to fit development and valuation?

cf4869 avatar Apr 11 '22 21:04 cf4869

Do you mean to insert offsets for specific parameters rather than fitting the parameters from the data? Unfortunately, no.

Under the hood, the regression is being carried out by sklearn.linearmodel.Linear_Regression which doesn't support offset parameters. The statsmodels.GLM implementation does support offsets, but chainladder-python is not currently built on it.

jbogaardt avatar Apr 11 '22 21:04 jbogaardt