ChainLadder icon indicating copy to clipboard operation
ChainLadder copied to clipboard

Odd bootstrap results

Open ibrow321 opened this issue 7 years ago • 8 comments

When I use the attached data and code to produce bootstrap results the output seems odd. The IBNR S.E for origin year 5 (a mature year) is for example 10% for seed 12 and 17% for seed 99 (not cherry picking seeds, just picked a couple at random). Both of these are much higher than the Mack SE of 0.08%. I have been working on a lot of triangles and for many bootstrap and mack give similar results, but for some there are very large differences, particularly in mature years. Sometimes the difference only appears on certain seeds and sometimes running with the same seed but a slight tweak to the data such as different number of decimal places or extra (stable) year of development on a large triangle cause the odd bootstrap SE result to appear or disappear. Any suggestions? Am I understanding correctly that this is likely to be a issue with the package rather than differences between Mack and Bootstrap methods?

set.seed(12) B <- BootChainLadder(data, R=1000, process.distr="gamma") mack <- MackChainLadder(data, est.sigma="Mack")

data.zip

BootChainLadder(Triangle = data, R = 1000, process.distr = "gamma")

Latest Mean Ultimate Mean IBNR IBNR.S.E IBNR 75% IBNR 95% 1 0.216 0.216 0.00e+00 0.0000 0.00e+00 0.00e+00 2 0.247 0.247 9.55e-05 0.0125 1.62e-76 1.04e-06 3 0.251 0.250 -8.37e-04 0.0150 3.70e-21 3.54e-04 4 0.968 0.966 -2.52e-03 0.0497 1.37e-04 3.59e-02 5 1.725 1.723 -2.61e-03 0.1033 3.94e-03 6.75e-02 6 1.952 1.949 -2.81e-03 0.0846 7.29e-03 9.43e-02 7 1.952 1.944 -7.88e-03 0.1030 8.87e-03 8.66e-02 8 1.825 1.819 -5.92e-03 0.0965 8.62e-03 8.30e-02 9 0.894 0.884 -9.46e-03 0.0671 1.15e-03 6.18e-02 10 0.260 0.258 -2.34e-03 0.0295 4.08e-06 1.80e-02 11 0.520 0.517 -2.42e-03 0.0413 5.02e-04 5.20e-02 12 1.532 1.513 -1.98e-02 0.1081 5.44e-03 7.66e-02 13 1.166 1.144 -2.17e-02 0.0928 1.71e-03 7.20e-02 14 0.354 0.350 -3.96e-03 0.0416 5.46e-04 3.70e-02 15 0.623 0.618 -5.40e-03 0.0556 3.61e-03 6.41e-02 16 1.119 1.099 -2.06e-02 0.0914 5.97e-03 8.31e-02 17 0.803 0.775 -2.80e-02 0.0846 -1.42e-04 6.09e-02 18 0.761 0.741 -1.93e-02 0.0798 6.35e-03 8.32e-02 19 0.455 0.451 -4.34e-03 0.0638 1.12e-02 9.51e-02 20 0.426 0.446 1.96e-02 0.0724 4.52e-02 1.49e-01 21 0.449 0.505 5.54e-02 0.0909 9.35e-02 2.11e-01 22 0.691 0.944 2.53e-01 0.1860 3.57e-01 5.99e-01 23 0.112 0.426 3.14e-01 0.2955 4.56e-01 8.38e-01

MackChainLadder(Triangle = data, est.sigma = "Mack")

Latest Dev.To.Date Ultimate IBNR Mack.S.E CV(IBNR) 1 0.216 1.000 0.216 0.00e+00 0.00e+00 NaN 2 0.247 1.000 0.247 -2.28e-05 2.54e-05 -1.113 3 0.251 1.000 0.251 3.90e-06 6.64e-05 17.006 4 0.968 1.001 0.968 -6.36e-04 4.58e-04 -0.721 5 1.725 1.001 1.724 -9.99e-04 7.62e-04 -0.763 6 1.952 1.001 1.950 -1.98e-03 1.11e-03 -0.562 7 1.952 1.004 1.945 -7.04e-03 8.75e-03 -1.244 8 1.825 1.004 1.817 -7.69e-03 8.45e-03 -1.099 9 0.894 1.010 0.885 -8.67e-03 8.57e-03 -0.988 10 0.260 1.007 0.258 -1.92e-03 5.33e-03 -2.774 11 0.520 1.009 0.515 -4.40e-03 7.91e-03 -1.799 12 1.532 1.012 1.515 -1.76e-02 1.71e-02 -0.973 13 1.166 1.015 1.149 -1.72e-02 2.04e-02 -1.189 14 0.354 1.010 0.350 -3.61e-03 1.51e-02 -4.189 15 0.623 1.012 0.616 -7.47e-03 2.18e-02 -2.917 16 1.119 1.019 1.099 -2.03e-02 3.85e-02 -1.894 17 0.803 1.035 0.776 -2.70e-02 3.99e-02 -1.475 18 0.761 1.023 0.743 -1.73e-02 4.69e-02 -2.719 19 0.455 1.002 0.454 -8.51e-04 4.52e-02 -53.153 20 0.426 0.950 0.449 2.26e-02 8.38e-02 3.711 21 0.449 0.883 0.509 5.94e-02 1.03e-01 1.732 22 0.691 0.724 0.954 2.63e-01 2.28e-01 0.865 23 0.112 0.258 0.435 3.23e-01 3.96e-01 1.227

ibrow321 avatar Apr 26 '17 08:04 ibrow321

Do you get more stable results when you run more samples, e.g. set R=10000

-- Markus Gesmann Blog: http://www.magesblog.com

On 26 Apr 2017, at 09:30, ibrow321 [email protected] wrote:

When I use the attached data and code to produce bootstrap results the output seems odd. The IBNR S.E for origin year 5 (a mature year) is for example 10% for seed 12 and 17% for seed 99 (not cherry picking seeds, just picked a couple at random). Both of these are much higher than the Mack SE of 0.08%. I have been working on a lot of triangles and for many bootstrap and mack give similar results, but for some there are very large differences, particularly in mature years. Sometimes the difference only appears on certain seeds and sometimes running with the same seed but a slight tweak to the data such as different number of decimal places or extra (stable) year of development on a large triangle cause the odd bootstrap SE result to appear or disappear. Any suggestions? Am I understanding correctly that this is likely to be a issue with the package rather than differences between Mack and Bootstrap methods?

set.seed(12) B <- BootChainLadder(data, R=1000, process.distr="gamma") mack <- MackChainLadder(data, est.sigma="Mack")

data.zip

BootChainLadder(Triangle = data, R = 1000, process.distr = "gamma")

Latest Mean Ultimate Mean IBNR IBNR.S.E IBNR 75% IBNR 95% 1 0.216 0.216 0.00e+00 0.0000 0.00e+00 0.00e+00 2 0.247 0.247 9.55e-05 0.0125 1.62e-76 1.04e-06 3 0.251 0.250 -8.37e-04 0.0150 3.70e-21 3.54e-04 4 0.968 0.966 -2.52e-03 0.0497 1.37e-04 3.59e-02 5 1.725 1.723 -2.61e-03 0.1033 3.94e-03 6.75e-02 6 1.952 1.949 -2.81e-03 0.0846 7.29e-03 9.43e-02 7 1.952 1.944 -7.88e-03 0.1030 8.87e-03 8.66e-02 8 1.825 1.819 -5.92e-03 0.0965 8.62e-03 8.30e-02 9 0.894 0.884 -9.46e-03 0.0671 1.15e-03 6.18e-02 10 0.260 0.258 -2.34e-03 0.0295 4.08e-06 1.80e-02 11 0.520 0.517 -2.42e-03 0.0413 5.02e-04 5.20e-02 12 1.532 1.513 -1.98e-02 0.1081 5.44e-03 7.66e-02 13 1.166 1.144 -2.17e-02 0.0928 1.71e-03 7.20e-02 14 0.354 0.350 -3.96e-03 0.0416 5.46e-04 3.70e-02 15 0.623 0.618 -5.40e-03 0.0556 3.61e-03 6.41e-02 16 1.119 1.099 -2.06e-02 0.0914 5.97e-03 8.31e-02 17 0.803 0.775 -2.80e-02 0.0846 -1.42e-04 6.09e-02 18 0.761 0.741 -1.93e-02 0.0798 6.35e-03 8.32e-02 19 0.455 0.451 -4.34e-03 0.0638 1.12e-02 9.51e-02 20 0.426 0.446 1.96e-02 0.0724 4.52e-02 1.49e-01 21 0.449 0.505 5.54e-02 0.0909 9.35e-02 2.11e-01 22 0.691 0.944 2.53e-01 0.1860 3.57e-01 5.99e-01 23 0.112 0.426 3.14e-01 0.2955 4.56e-01 8.38e-01

MackChainLadder(Triangle = data, est.sigma = "Mack")

Latest Dev.To.Date Ultimate IBNR Mack.S.E CV(IBNR) 1 0.216 1.000 0.216 0.00e+00 0.00e+00 NaN 2 0.247 1.000 0.247 -2.28e-05 2.54e-05 -1.113 3 0.251 1.000 0.251 3.90e-06 6.64e-05 17.006 4 0.968 1.001 0.968 -6.36e-04 4.58e-04 -0.721 5 1.725 1.001 1.724 -9.99e-04 7.62e-04 -0.763 6 1.952 1.001 1.950 -1.98e-03 1.11e-03 -0.562 7 1.952 1.004 1.945 -7.04e-03 8.75e-03 -1.244 8 1.825 1.004 1.817 -7.69e-03 8.45e-03 -1.099 9 0.894 1.010 0.885 -8.67e-03 8.57e-03 -0.988 10 0.260 1.007 0.258 -1.92e-03 5.33e-03 -2.774 11 0.520 1.009 0.515 -4.40e-03 7.91e-03 -1.799 12 1.532 1.012 1.515 -1.76e-02 1.71e-02 -0.973 13 1.166 1.015 1.149 -1.72e-02 2.04e-02 -1.189 14 0.354 1.010 0.350 -3.61e-03 1.51e-02 -4.189 15 0.623 1.012 0.616 -7.47e-03 2.18e-02 -2.917 16 1.119 1.019 1.099 -2.03e-02 3.85e-02 -1.894 17 0.803 1.035 0.776 -2.70e-02 3.99e-02 -1.475 18 0.761 1.023 0.743 -1.73e-02 4.69e-02 -2.719 19 0.455 1.002 0.454 -8.51e-04 4.52e-02 -53.153 20 0.426 0.950 0.449 2.26e-02 8.38e-02 3.711 21 0.449 0.883 0.509 5.94e-02 1.03e-01 1.732 22 0.691 0.724 0.954 2.63e-01 2.28e-01 0.865 23 0.112 0.258 0.435 3.23e-01 3.96e-01 1.227

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

mages avatar Apr 26 '17 09:04 mages

Not particularly. For the example above, rerunning with R=10000 gives Bootstrap SE in Origin Year 5 of 25% for seed 12 and 42% for seed 99 (compared with 10% and 17% using R=1000).

ibrow321 avatar Apr 26 '17 09:04 ibrow321

I think the behaviour is a result of your data. The years have very different performance: rplot05

If you look at the standard MackChainLadder output you will see that the residuals are occasionally quite large. Some of them are outside the band of -2 to 2.

rplot

This will have an impact on the BootChainLadder as well. Indeed, looking at the plot of the BootChainLadder function shows you that there are a few outliers:

rplot04

In summary, neither MackChainLadder nor BootChainLadder seem to be applicable models for your data.

mages avatar Apr 26 '17 10:04 mages

Nice sleuthing, @mages. @ibrow321, at a previous consultancy with some bootstrap expertise, we would sometimes see odd bootstrap results because England and Verrall's approach was to sample from the residuals under the assumption that residuals throughout the triangle are similarly distributed. Markus' residual graphs suggest this assumption may not hold with your data.

chiefmurph avatar Apr 26 '17 13:04 chiefmurph

Thanks for your responses. I appreciate that some residuals are large however I am not sure this explains the behavior seen. In particular if you look at the bootstrap IBNR plot output is clustered closely around 0 except for two extreme samples, one around -400 and one around 400. What is happening there?

I did some more testing for this data on larger values of R, 1k, 10k and 100k and found S.E. increasing markedly for larger samples - up to 450% on 100k. Is it possible that the relatively large residuals are causing some sample origin years to have development amounts close to zero at some AY point, so when there is normal development in the next AY the CL dev factor is massive, potentially infinite?

ibrow321 avatar Apr 27 '17 11:04 ibrow321

Interesting thought. You could test your idea with an artificial triangle.

On 27 April 2017 at 12:52, ibrow321 [email protected] wrote:

Thanks for your responses. I appreciate that some residuals are large however I am not sure this explains the behavior seen. In particular if you look at the bootstrap IBNR plot output is clustered closely around 0 except for two extreme samples, one around -400 and one around 400. What is happening there?

I did some more testing for this data on larger values of R, 1k, 10k and 100k and found S.E. increasing markedly for larger samples - up to 450% on 100k. Is it possible that the relatively large residuals are causing some sample origin years to have development amounts close to zero at some AY point, so when there is normal development in the next AY the CL dev factor is massive, potentially infinite?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mages/ChainLadder/issues/35#issuecomment-297692141, or mute the thread https://github.com/notifications/unsubscribe-auth/ABF-1XcKNIJw1oXVe_ud75LPBSoZpVxRks5r0IGKgaJpZM4NIh4b .

mages avatar Apr 27 '17 13:04 mages

Conceivable that, because residuals are sampled with replacement, the pseudo-random scenario in which larger residuals are disproportionately represented is more likely to occur at higher numbers of replicates.

chiefmurph avatar Apr 27 '17 16:04 chiefmurph

Here is the plotParms view of the MackChainLadder(ibrow's data): [image: Inline image 1]

Those curves may be the tamest plotParms I've ever seen. Wow. The coefficient of variation of f-1 rarely strays outside +-unity, with one exception. Could that one exception be the cause of extremely high BootChainLadder scenarios? I would would want to know how much the results would change if you edited the second-to-oldest data point, then run a modified bootstrap with dev <=72 in one group followed by >72 in another group. That might give you a lower bound on an estimate of the reserve variability from that data. The same consultancy considered multiple bootstrap regions but the first time I saw that in a book was in Guy Carpenter's ERM book. Don't know off the top of my head if that would be an easy modification of the BootChainLadder logic.

PS: @ibrow321: plotParms is a set of functions I wrote that are in need of a use case before ChainLadder implementation. I just thought they were pretty graphs but hadn't thought about how they might be useful. I wonder if you could save the plotparms output for each iteration of BootChainLadder, then look at the distribution in three-dimensions.

On Thu, Apr 27, 2017 at 9:36 AM, chiefmurph [email protected] wrote:

Conceivable that, because residuals are sampled with replacement, the pseudo-random scenario in which larger residuals are disproportionately represented is more likely to occur at higher numbers of replicates.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mages/ChainLadder/issues/35#issuecomment-297770256, or mute the thread https://github.com/notifications/unsubscribe-auth/AGKcB5vWRuRybuZHHdrlPKlzSRbvLircks5r0MQqgaJpZM4NIh4b .

trinostics avatar Apr 29 '17 00:04 trinostics