policyengine-uk icon indicating copy to clipboard operation
policyengine-uk copied to clipboard

Ensure imputed capital gains CDF is valid (monotonic)

Open MaxGhenis opened this issue 1 year ago • 2 comments

impute_capital_gains currently interpolates/extrapolates the provided quantiles to a CDF by fitting splines. This can result in CDFs that are not monotonically increasing and thus invalid.

After asking ChatGPT for some ideas, I think a promising approach could be first synthesizing a pdf from the quantiles, smoothing it with a kernel density estimator, then integrating it to a cdf. Here's an example of how that might look:

image

Other options like isotonic regression or transformations could also work, and we may want something more complex if we want to consider all the data together rather than each income group independently.

MaxGhenis avatar Feb 20 '24 04:02 MaxGhenis

FWIW I only saw one income level where the spline was obviously nonmonotonic, so might not be such a high priority: image

https://policyengine-uk-documentation.nw.r.appspot.com/Capital_Gains_Tax

MaxGhenis avatar Feb 25 '24 05:02 MaxGhenis

The PCHIP Interpolator seems ideally suited to this. It both preserves monotonicity and supports extrapolation.

Here's an example for the 99th income centile, which the spline currently produces a nonmonotonic interpolation from.

image

Relevant code (notebook):

pchip_interpolator = PchipInterpolator(quantiles, gains, extrapolate=True)
extended_quantiles = np.linspace(0.01, 0.99, 99)
extended_gains = pchip_interpolator(extended_quantiles)

MaxGhenis avatar Feb 25 '24 15:02 MaxGhenis