Tax-Calculator icon indicating copy to clipboard operation
Tax-Calculator copied to clipboard

Add TAXSIM-32 Validation: To replace PR #2453

Open jdebacker opened this issue 3 years ago • 15 comments

This PR works off the contributions from @chusloj in PR #2453.

My goals are the following (over and above what was proposed in PR #2453):

  • [x] Have PR from a branch controlled by an active contributor
  • [x] Add test datasets for 2017
  • [x] Verify that the input datasets produced by taxsim_input.py for the years 2017 and 2018 return the same results with TAXSIM-27 and TAXSIM-32 (or explain differences if not)
    • They do not. We find that a small number of records, mostly with capital losses, do not match exactly in the two calculators.
    • We have not yet been able to explain this.
    • Input files for 2017 produce the same results with TAXSIM-27 and 32.
    • Input file "a" for 2018 produces the same results with TAXSIM-27 and 32.
    • Input files "b" and "c" for 2018 produce different results for a small number of records with TAXSIM-27 and 32.
  • ~~Verify that for years 2017 and 2018, the input datasets from the TAXSIM-27 validation (in /taxcalc/validation/taxsim27), produce the same output from TAXSIM-27 and TAXSIM-32 (or explain differences if not)~~
    • We'll call the exercise above sufficient. This is essentially the same check.
  • [ ] Verify that the differences between taxcalc ands TAXSIM-32 are the same as any differences between taxcalc and TAXSIM-27 (checked into the repo as /taxcalc/validation/taxsim27/{assumption set}{year}.taxdiffs-expect) using the input datasets in /taxcalc/validation/taxsim27
  • [ ] Verify that any differences between taxcalc and TAXSIM-32 the input datasets produced by taxsim_input.py for the years 2017 and 2018 can be explained the same reasons for any non-zero expected differences in /taxcalc/validation/taxsim27/{assumption set}{year}.taxdiffs-expect
  • [ ] Ensure that differences between taxcalc and TAXSIM-32 the input datasets produced by taxsim_input.py for the year 2019 are zero or can be explained (from TAXSIM-27 validation, it appears that for the a and b input datasets, there are few differences and they are all less than $1, but the c datasets may produce some larger differences (see, e.g., c18.taxdiffs-expect
  • Add utilities to more easily perform these validation exercises:
    • [x] Output saved uses taxcalc variable names (rather than the v names from TAXSIM) to ease interpretation of output.
    • [ ] Descriptive tables are produced comparing input variables from observations that do not match and those that do (e.g., so one can easily see differences like `those that don't match have pass-through income while all those that match do not).
  • [x] CSV files with TAXSIM and taxcalc intermediate variables (mapping TAXSIM to taxcalc variable names for ease of comparison) from samples of observations that do not match (e.g., to help identify where in the determination of the income tax amount calculations began to differ)

Other suggestions welcome.

cc @bodiyang @MattHJensen

jdebacker avatar Sep 06 '21 19:09 jdebacker

Codecov Report

Merging #2619 (fcf7001) into master (f407cb5) will not change coverage. The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2619   +/-   ##
=======================================
  Coverage   98.54%   98.54%           
=======================================
  Files          14       14           
  Lines        2609     2609           
=======================================
  Hits         2571     2571           
  Misses         38       38           
Flag Coverage Δ
unittests 98.54% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

codecov[bot] avatar Sep 06 '21 19:09 codecov[bot]

@jdebacker said recently:

We cannot get intermediate variables f[ro]m internet TAXSIM

Are you sure about that?
I think it's pretty common for Internet TAXSIM users to ask for output detail.

martinholmer avatar Sep 27 '21 15:09 martinholmer

I think it's pretty common for Internet TAXSIM users to ask for output detail.

@martinholmer It may be possible, but I haven't figured it out without having to ask Dan specifically. Do you know of an automated way to get this output?

jdebacker avatar Sep 27 '21 16:09 jdebacker

I think it's pretty common for Internet TAXSIM users to ask for output detail.

It may be possible, but I haven't figured it out without having to ask Dan specifically. Do you know of an automated way to get this output?

Dan explains it all on the TAXSIM32 website: https://users.nber.org/~taxsim/taxsim32/

martinholmer avatar Sep 27 '21 16:09 martinholmer

Dan explains it all on the TAXSIM32 website: https://users.nber.org/~taxsim/taxsim32/

Ah - we have those. I was thinking of more detailed intermediate calculations and especially the QBID.

jdebacker avatar Sep 27 '21 16:09 jdebacker

Give me a call if you have any problems.

dan 617-682-6204

On Mon, 27 Sep 2021, Martin Holmer wrote:

@jdebacker said recently:

  We cannot get intermediate variables f[ro]m internet TAXSIM

Are you sure about that? I think it's pretty common for Internet TAXSIM users to ask for output detail.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.[AB55AVIUBQSDRNNM4BHQ2S3UECGKRA5CNFSM5DQ6MTE2YY3PNVWWK3TUL52HS4DFVREXG43VMV BW63LNMVXHJKTDN5WW2ZLOORPWSZGOG5ICB4Q.gif]

feenberg avatar Sep 27 '21 17:09 feenberg

I can add to the list of intermediate results. Are you thinking of the web interface or the Stata interface? Are you thinking of a single taxpayer, for diagnostics, or a large sample (harder). Don't hesitate to call.

dan 617-682-6204

On Mon, 27 Sep 2021, Jason DeBacker wrote:

  Dan explains it all on the TAXSIM32 website:
  https://users.nber.org/~taxsim/taxsim32/

Ah - we have those. I was thinking of more detailed intermediate calculations and especially the QBID.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.[AB55AVOORGNDPLRRV7IGIGDUECKMJA5CNFSM5DQ6MTE2YY3PNVWWK3TUL52HS4DFVREXG43VMV BW63LNMVXHJKTDN5WW2ZLOORPWSZGOG5IMQYY.gif]

feenberg avatar Sep 27 '21 17:09 feenberg

@jdebacker said:

Dan explains it all on the TAXSIM32 website: https://users.nber.org/~taxsim/taxsim32/

Ah - we have those. I was thinking of more detailed intermediate calculations and especially the QBID.

Sorry for my confusion, but the description of this PR's goals is not very precise.

martinholmer avatar Sep 28 '21 13:09 martinholmer

I am here, and willing to talk if you need something.

Daniel Feenberg 617-682-6204

On Tue, 28 Sep 2021, Martin Holmer wrote:

@jdebacker said:

        Dan explains it all on the TAXSIM32 website:
        https://users.nber.org/~taxsim/taxsim32/

  Ah - we have those. I was thinking of more detailed intermediate
  calculations and
  especially the QBID.

Sorry for my confusion, but the description of this PR's goals is not very precise.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.[AB55AVPICEJOMIQTXUKP66LUEG5YFA5CNFSM5DQ6MTE2YY3PNVWWK3TUL52HS4DFVREXG43VMV BW63LNMVXHJKTDN5WW2ZLOORPWSZGOG5RHKBQ.gif]

feenberg avatar Sep 28 '21 15:09 feenberg

@feenberg Thank you for the offer to help here! I'm teaching all day today, but will try to put together my thoughts tomorrow and get something to you.

jdebacker avatar Sep 28 '21 16:09 jdebacker

@martinholmer writes:

Sorry for my confusion, but the description of this PR's goals is not very precise.

This could be because I haven't quite settled on how best to do these validation exercises. I understand going line by line through particular records, but is there a more programmatic way to identify differences? This seems very time consuming (and is not helped by me only being able to work on this in small chunks 2-3 times per week). Do you have tips/best practices for validation like this?

jdebacker avatar Sep 28 '21 16:09 jdebacker

@jdebacker asked:

This [imprecise wording] could be because I haven't quite settled on how best to do these validation exercises. I understand going line by line through particular records, but is there a more programmatic way to identify differences?

Not that I'm aware of. Remember it is the computer programs (not humans) that are "going line by line" through the records to compare Tax-Calculator results with TAXSIM32 results. Just make those computer programs show you the results of those tedious comparisons in a way that is helpful to you.

martinholmer avatar Sep 28 '21 19:09 martinholmer

I continue to be mystified by the belief that one can debug a tax calculator by looking at aggregates. If you sum AGI over 100,000 records and find it is wrong, that hardly guides you to where the error is. Whereas if you have a single record with an incorrect AGI, you can easily do the calculation yourself, and see exactly where the computer goes wrong. Note that the calculated values in recent PUF files are just the output of my SAS calculator, so you should be able to match them exactly.

Dan

On Tue, 28 Sep 2021, Martin Holmer wrote:

@jdebacker asked:

  This [imprecise wording] could be because I haven't quite
  settled on how best to do these validation exercises. I
  understand going line by line through particular records, but is
  there a more programmatic way to identify differences?

Not that I'm aware of. Remember it is the computer programs (not humans) that are "going line by line" through the records to compare Tax-Calculator results with TAXSIM32 results. Just make those computer programs show you the results of those tedious comparisons in a way that is helpful to you.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.[AB55AVN2DAM7WTVF3NVYM2DUEILZJA5CNFSM5DQ6MTE2YY3PNVWWK3TUL52HS4DFVREXG43VMV BW63LNMVXHJKTDN5WW2ZLOORPWSZGOG5UCKVA.gif]

feenberg avatar Sep 28 '21 20:09 feenberg

@feenberg said:

I continue to be mystified by the belief that one can debug a tax calculator by looking at aggregates.

I'm in complete agreement with this statement. The most helpful information is how two models give different results for one particular record. That is the kind of information the old comparison program in the validation/taxsim27 directory produced. I haven't looked closely enough at the replacement program in this PR to know if it is still focused that way or whether it has changed and now focuses on aggregates.

martinholmer avatar Sep 28 '21 20:09 martinholmer

@martinholmer writes:

I'm in complete agreement with this statement. The most helpful information is how two models give different results for one particular record. That is the kind of information the old comparison program in the validation/taxsim27 directory produced. I haven't looked closely enough at the replacement program in this PR to know if it is still focused that way or whether it has changed and now focuses on aggregates.

Thanks for taking the time to look at what this PR does before commenting. This is not a replacement, but an addition to the TAXSIM-27 validation tools. It does the same record-by-record comparisons as the TAXSIM-27 validation, but with TAXSIM-32.

jdebacker avatar Sep 28 '21 23:09 jdebacker

Now using TAXSIM-35 for the validation...

jdebacker avatar Mar 16 '23 12:03 jdebacker

I believe this PR is ready. It adds utilities for testing against TAXSIM 35.

It does not update the expected files for 2017-2019 and does not add them for 2020 and 2021. This should be done after a couple issues are resolved, which the utilities included in this PR as helpful in identifying:

  1. Some differences between how Tax-Calculator and TAXSIM calculate the Recovery Rebate Credit in 2020 (it looks like a difference in how the phaseout is calculated.
  2. Some differences between how Tax-Calculator and TAXSIM calculator (or report) the child tax credit in 2021. Here, it looks like in many cases both yield the same tax liability, but report different amounts for the child tax credit amount (perhaps due to a difference in reporting the uncapped amount?).

I think work on those two issues should be done in subsequent PRs.

jdebacker avatar May 03 '23 23:05 jdebacker

Please include me in discussion of how rebates should be handled.

I don't know if a rebate legislated and paid in year t based on tax status in year t-1 should be included with year t or year t-1 results, and should it even be included in liability or reported separately? The rebate affects the ex post marginal rate in year t-1, and after-tax income only in year t.

I have a bunch of state rebates to consider also.

Dan

On Wed, 3 May 2023, Jason DeBacker wrote:

I believe this PR is ready. It adds utilities for testing against TAXSIM 35.

It does not update the expected files for 2017-2019 and does not add them for 2020 and 2021. This should be done after a couple issues are resolved, which the utilities included in this PR as helpful in identifying:

  1. Some differences between how Tax-Calculator and TAXSIM calculate the Recovery Rebate Credit in 2020 (it looks like a difference in how the phaseout is calculated.
  2. Some differences between how Tax-Calculator and TAXSIM calculator (or report) the child tax credit in 2021. Here, it looks like in many cases both yield the same tax liability, but report different amounts for the child tax credit amount (perhaps due to a difference in reporting the uncapped amount?).

I think work on those two issues should be done in subsequent PRs.

? Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you werementioned.[AB55AVO7KF4ISVSW57BLTQLXELR5LA5CNFSM5DQ6MTE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMV XHJKTDN5WW2ZLOORPWSZGOLNWQ4QQ.gif] Message ID: @.***>

feenberg avatar May 04 '23 11:05 feenberg

Some differences between how Tax-Calculator and TAXSIM calculator (or report) the child tax credit in 2021. Here, it looks like in many cases both yield the same tax liability, but report different amounts for the child tax credit amount (perhaps due to a difference in reporting the uncapped amount?).

This might be due to TAXSIM35 adding the odc amount to the CTC amount. (HT Martin, who noted it here: https://github.com/PSLmodels/Tax-Calculator/issues/2658#issue-1203295931.

MattHJensen avatar May 05 '23 13:05 MattHJensen

@jdebacker I support merging this and then tracking down additional differences in another PR if that's your preference. Looks really great.

MattHJensen avatar May 05 '23 13:05 MattHJensen

On Fri, 5 May 2023, Matt Jensen wrote:

  Some differences between how Tax-Calculator and TAXSIM
  calculator (or report) the child tax credit in 2021. Here, it
  looks like in many cases both yield the same tax liability, but
  report different amounts for the child tax credit amount
  (perhaps due to a difference in reporting the uncapped amount?).

This might be due to TAXSIM35 adding the odc amount to the CTC amount. (HT Martin, who noted it here: #2658 (comment).

Please report to me anything you regard as a problem with taxsim. I can normally get things fixed in a day or two. It is best if you can give me a sample taxpayer exhibiting the problem, especially a simple taxpayer.

Daniel Feenberg 617-682-6204

? Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you werementioned.[AB55AVL75TWNF6VPQ4Y66JTXET42DA5CNFSM5DQ6MTE2YY3PNVWWK3TUL52HS4DF VREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOLOIVHUY.gif] Message ID: @.***>

feenberg avatar May 05 '23 20:05 feenberg