ietoolkit [iebaltab] March 2022 rewrite review

General:

[x] Rename main branch
[x] run-old-versions.md link in deprecation error message 404s

Help file / Options:

[x] Output options descriptions need to be updated (save, browse)
[x] The meaning of grpcodes is unclear -- I cannot determine what it does
[x] "Statistics" options should be listed right after the mandatory options, then, separately, the "data treatment" ones. These are much more important than labelling!
[x] "F-test" options can be added to the statistics options. What are missing values in F tests? fnoobs should be with the display options.
[x] What does missminmean() do?
[x] pairoutput option accepts multiple inputs, such as pairoutput(beta t), which I want to do, but then causes error and returns: option pout_lbl() required without explanation. Recommend to also allow se and ci.

Math:

[x] Default set of comparison tests (all vs all) is good
[x] ⚠️ I cannot quite understand what ftest is doing -- is this the omnibus test? If so I would enable it by default.
[x] ⚠️ SEs and SDs for groups do not display
[x] pairoutput option could be required, which would remove any issue of default functionality? It could also be called something more intuitive like Stats()
[x] ⚠️ I am not sure that the nrmd (normalized differences) are correctly calculated. See this link:

The normalized differences approach of Imbens and Rubin (2015) is useful here. This is defined as the difference in means between the treatment and control groups, divided by the square root of half the sum of the treatment and control group variances

[ ] control() option should flip all regression signs (and subtraction order)
[x] New treatment() option should replace current control() option functionality
[x] There is no need for a separate fixedeffect() option as long as indicator i. variables can be appropriately included in the covariates() option
[x] What does covarmissok do? Could this potentially be combined with the requested casewise option somehow, since it appears conceptually similar?
[x] I would remove the balmiss(), balmissreg(), covmiss(), and covmissreg() options, or at least move them somewhere else ("Advanced"?)... this is quite dangerous....

Display:

[x] rowvarlabels behavior should be the default, with the option to disable
[x] Table notes should explain the meaning of non-numerical results like .v.
[x] The default paired t-tests should also report their SEs by default, with the option of instead producing P-values and, possibly, confidence intervals. These options need to be updated in the help file in any case:

The options nottest, normdiff, pttest, pftest and pboth have been deprecated as of version 7 of iebaltab. See if the
    options pairoutput or ftestoutput have the functionality you need. If you still need the old version of iebaltab,
    for example for reproduction of old code, see this guide for how to run old versions of any command in the ietoolkit
    package.

Misc:

[x] onerow returns an unhelpful error if the number of observations varies across regressions. The follow, for example, occurs, but the number of obs in the regressions is not reported in the table without the command, so the source of the failure can't be found.

iebaltab price mpg rep78 , grpv(rep78) savexlsx("test.xlsx") replace control(2) total grpc onerow

Option onerow may only be used if the number of observations with non-missing values are the same in all groups
    across all balance variables. This is not true for group(s): [2_1 2_3 2_4 2_5]. Run the command again without this
    options to see in which column the number of observations is not the same for all rows.

Mar 28 '22 19:03 bbdaniels

SE and SD went missing when I capitalized the titles. It is fixed in 33117f7398c2bfe92caadefed6619c9fd947d8eb

Mar 29 '22 09:03 kbjarkefur

Yes, ftest does that test. This is still accurate in the old helpfile:

reg testgroupdummy balancevarlist
testparm balancevarlist

I prefer to keep it an option for backward compatibility reasons and that it is cleaner IMO to add options for things you add instead of add options to remove things. I can also see someone who does not like f-test come with feedback saying that we should not promote this or that test.

Mar 29 '22 09:03 kbjarkefur

I like stats(). And then it would be specified like stats(pair(t) f(p))

Mar 29 '22 09:03 kbjarkefur

The options missminmean(), covarmissok, balmiss(), balmissreg(), covmiss(), and covmissreg() are among the options I said will be deprecated. They are soon gone so dont worry about them 😃

Mar 29 '22 09:03 kbjarkefur

I like stats(). And then it would be specified like stats(pair(t) f(p))

I don't think it even needs the sub-functions -- if it always takes two arguments then it must be some combination of the specification for the estimate and the specification for the uncertainty. ie, stats(mean se) or stats(beta ci) is completely unambiguous

Mar 29 '22 18:03 bbdaniels

I don't think it even needs the sub-functions -- if it always takes two arguments then it must be some combination of the specification for the estimate and the specification for the uncertainty. ie, stats(mean se) or stats(beta ci) is completely unambiguous

Preference can and are often different across tests and, for example, ci does not work on an F test. It doesn't make sense to me to replace pairoutput with stats() and then keep ftestoutput() and feqtestoutput().

The syntax will be stats(pair(string) f(string) feq(string)) where those three strings allows different input.

I can see group() being added in a future version.

Apr 07 '22 16:04 kbjarkefur

"F-test" options can be added to the statistics options.

No due to backward compatibility

What are missing values in F tests?

I assume that this is comment in relation to fmissok that is now a deprecated option

noobs should be with the display options.

done in 11d283bdc83d1dab88a2ba02ff37e69c15c94c5b

May 02 '22 11:05 kbjarkefur

Default set of comparison tests (all vs all) is good

I do not understand what this means

May 02 '22 11:05 kbjarkefur

pairoutput option accepts multiple inputs, such as pairoutput(beta t)

We decided in the meeting that this is not part of this version

causes error and returns: option pout_lbl() required without explanation

This option is replaced by stats()

Recommend to also allow se and ci.

Done for stats(pair(se)) and stats(pair(sd)) in f9b2a267bfe8a1b1c5a8b303aff4fe80979a54cb

ci has two values which require bigger changes. Possible to implement but out of scope in the re-write.

May 02 '22 11:05 kbjarkefur

pairoutput option could be required, which would remove any issue of default functionality?

No due to backward compatibility

It could also be called something more intuitive like Stats()

Done in d3705baa7f8b58683273a686aa8531803b8c543d

May 02 '22 11:05 kbjarkefur

I am not sure that the nrmd (normalized differences) are correctly calculated. See this link:

@bbdaniels, can you give me a correct formula? Preferable in terms of Stata code? I understand a text like this, but I need your help to translate this into code at the level of accuracy a big command like this requires

May 02 '22 11:05 kbjarkefur

New treatment() option should replace current control() option functionality

Hesitant to remove control() due to backward compatibility, but if you elaborate on the benefit of a new option treatment() is and why it must replace control() I am happy to consider it

May 02 '22 11:05 kbjarkefur

There is no need for a separate fixedeffect() option as long as indicator i. variables can be appropriately included in the covariates() option

I don't disagree if the command was new. I do not see an issue with fixedeffect() big enough to break backward compatibility. Let me know if there is anything I am missing.

May 02 '22 11:05 kbjarkefur

rowvarlabels behavior should be the default, with the option to disable

Again, I don't disagree if the command was new, but do not see what would justify to break backward compatibility. Let me know if there is anything I am missing.

May 02 '22 11:05 kbjarkefur

Table notes should explain the meaning of non-numerical results like .v

Are you sure the table notes is the right place for this? Missing values should never end up in the table. And I see a big risk for confusion if the table note explains things that are found in other places rather then the table.

I was thinking of explaining this in the helpfile. Only people skilled enough to look in the helpfile would be skilled enough to use return list to access a martix.

May 02 '22 11:05 kbjarkefur

The default paired t-tests should also report their SEs by default, with the option of instead producing P-values and, possibly, confidence intervals.

Risking to sound like a record on repeat, I think backward compatibility is important so I am reluctant to change default behavior.

Multiple stats for the pair tests is out of the scope for this re-write.

These options need to be updated in the help file in any case:

The first section of the helpfile is updated in 3a0a9f8fdfd1306a32412deb22241fc1f216405d The rest of the helpfile will be updated once we have finalized the code of the command

May 02 '22 11:05 kbjarkefur

onerow returns an unhelpful error if the number of observations varies across regressions.

Agreed, I will implement

May 02 '22 11:05 kbjarkefur

There is no need for a separate fixedeffect() option as long as indicator i. variables can be appropriately included in the covariates() option

I don't disagree if the command was new. I do not see an issue with fixedeffect() big enough to break backward compatibility. Let me know if there is anything I am missing.

I don't think it needs to break anything -- just deprecate it and remove the option from the help file on this version. It will still work backwards, but people should prefer to use the modern syntax directly as a covariate -- for example, things like two-way fixed effects might require explicit regression specs. I don't think you need to do any more coding to make that work correctly since it just passes through the string, but I will test and check.

May 02 '22 20:05 bbdaniels

New treatment() option should replace current control() option functionality

Hesitant to remove control() due to backward compatibility, but if you elaborate on the benefit of a new option treatment() is and why it must replace control() I am happy to consider it

I think this would have no impact other than flipping the signs, which I understand is not ideal, but it would align better with the normal meaning of "treatment" and "control". So treatment() would do what control() does now, which is present the treatment effect for one group relative to all other groups. And control() would do the opposite, which is present treatment effects for all groups relative to the one specified.

Alternatively, to avoid breaking, control() could be deprecated and hidden -- that is, just pass through the treatment argument to the existing functionality so as not to make maintenance harder, and use a new term like base() for the option that flips the comparison. I would think that would not be terribly difficult and would preserve both backwards compatibility and maintainability?

May 02 '22 20:05 bbdaniels

Default set of comparison tests (all vs all) is good

I do not understand what this means

I just mean this is a good default and I like that you have done it. No need for any changes ☺️

May 02 '22 20:05 bbdaniels

"F-test" options can be added to the statistics options.

No due to backward compatibility

What are missing values in F tests?

I assume that this is comment in relation to fmissok that is now a deprecated option

noobs should be with the display options.

done in 11d283b

Sorry -- the first comment was only to refer to the help-file organization, not the functionality. I will suggest improvements there in commits on my next run-through.

May 02 '22 20:05 bbdaniels

I am not sure that the nrmd (normalized differences) are correctly calculated. See this link:

@bbdaniels, can you give me a correct formula? Preferable in terms of Stata code? I understand a text like this, but I need your help to translate this into code at the level of accuracy a big command like this requires

Yes, I will do this. Edit: Done. But needs resolution on comment below about which difference will be used. Should be the adjusted difference!

The correct formula will be:

abs(`pair_difference') / sqrt((`variance_1'+`variance_2')/2)

May 02 '22 20:05 bbdaniels

Table notes should explain the meaning of non-numerical results like .v

Are you sure the table notes is the right place for this? Missing values should never end up in the table. And I see a big risk for confusion if the table note explains things that are found in other places rather then the table.

I was thinking of explaining this in the helpfile. Only people skilled enough to look in the helpfile would be skilled enough to use return list to access a martix.

Hmm, I'm not sure why I was getting them then. I'll test again and see what I'd done and let you know.

May 02 '22 20:05 bbdaniels

One more big thing that I somehow missed before -- is it correct that the output table does not reflect the actual regression coefficients when the covariates() option is used?? I think is absolutely essential to get the adjusted coefs here, and IMO this should by default -- as with the other options, I am happy to suggest we deprecate and remove the covariates() option so it works with past code but is not noted for new users (and, should return a warning). I have used, for example, rhs() for such an option.

May 03 '22 19:05 bbdaniels

Ok I am trying to test the stats() stuff but I can't find direct instructions. When I tried normd as in the help file I get:

The options nottest, normdiff, pttest, pftest and pboth have been deprecated as of version 7 of iebaltab. See if the options pairoutput or ftestoutput have the functionality you need. If you still need the old version of iebaltab, for example for reproduction of old code, see this guide for how to run old versions of any command in the ietoolkit package.

But there is no pairoutput since it is stats() now, right? The stats_string definitions don't seem to be in the help file yet so I can't test thoroughly, but I can back it out from the parser and use:

iebaltab trunk weight length , grpvar(rep78) savex("/users/bbdaniels/desktop/test.xlsx") covar(price) replace stats(pair(nrmd))

Then I get normalized differences for all comparisons. For these, we will replace by the formula above. I don't think these use significance stars in any case so those will drop as well -- the magnitude should be the only metric reported.

May 03 '22 20:05 bbdaniels

There is no need for a separate fixedeffect() option as long as indicator i. variables can be appropriately included in the covariates() option

I don't disagree if the command was new. I do not see an issue with fixedeffect() big enough to break backward compatibility. Let me know if there is anything I am missing.

I don't think it needs to break anything -- just deprecate it and remove the option from the help file on this version. It will still work backwards, but people should prefer to use the modern syntax directly as a covariate -- for example, things like two-way fixed effects might require explicit regression specs. I don't think you need to do any more coding to make that work correctly since it just passes through the string, but I will test and check.

The two tables blow was created using syntax iebaltab weight price , grpvar(tmt_cl) [...] cov(mpg) fixedeffect(foreign) stats(pair(beta)) and iebaltab weight price , grpvar(tmt_cl) [...] cov(mpg i.foreign) stats(pair(beta)) and are identical. So the functionality you are suggesting exists (although by luck). Please let me know if you see a reason to still remove fixedeffect() that justify it breaking backward compatibilty.

May 09 '22 11:05 kbjarkefur

New treatment() option should replace current control() option functionality

Hesitant to remove control() due to backward compatibility, but if you elaborate on the benefit of a new option treatment() is and why it must replace control() I am happy to consider it

I think this would have no impact other than flipping the signs, which I understand is not ideal, but it would align better with the normal meaning of "treatment" and "control". So treatment() would do what control() does now, which is present the treatment effect for one group relative to all other groups. And control() would do the opposite, which is present treatment effects for all groups relative to the one specified.

Alternatively, to avoid breaking, control() could be deprecated and hidden -- that is, just pass through the treatment argument to the existing functionality so as not to make maintenance harder, and use a new term like base() for the option that flips the comparison. I would think that would not be terribly difficult and would preserve both backwards compatibility and maintainability?

I agree with your high level point and is softening up to this. However, where I have yet to understand what you envision is in the details.

If iebaltab were new I think control() would be the best option and it would take the integer of the value in the variable in grpvar() that is control. And then the signs would be flipped so a positive value in the pairwise test column equals a positive effect in that treatment vs. control.

In a multiarm treatment situation, what do you envision treatment() to take as values? Let's say grpvar() has values 0, 1, and 2, where 0 is the control group. Should the user then specify treatment(1 2) in your vision?

May 09 '22 11:05 kbjarkefur

I am not sure that the nrmd (normalized differences) are correctly calculated. See this link:

@bbdaniels, can you give me a correct formula? Preferable in terms of Stata code? I understand a text like this, but I need your help to translate this into code at the level of accuracy a big command like this requires

Yes, I will do this. Edit: Done. But needs resolution on comment below about which difference will be used. Should be the adjusted difference!

The correct formula will be:
abs(`pair_difference') / sqrt((`variance_1'+`variance_2')/2)

Where `pair_difference' I know where to get. But where, expressed in Stata specifics, would you get `variance_1' and `variance_2'? Can you point to something in return list or ereturn list following regress I can use for this?

May 09 '22 11:05 kbjarkefur

Table notes should explain the meaning of non-numerical results like .v

Are you sure the table notes is the right place for this? Missing values should never end up in the table. And I see a big risk for confusion if the table note explains things that are found in other places rather then the table. I was thinking of explaining this in the helpfile. Only people skilled enough to look in the helpfile would be skilled enough to use return list to access a martix.

Hmm, I'm not sure why I was getting them then. I'll test again and see what I'd done and let you know.

You are expected to get them in r(iebaltabrmat) and r(iebaltabfmat), but please send any reproducible example where they end up in the table my way. That would be interesting to look into.

May 09 '22 11:05 kbjarkefur

One more big thing that I somehow missed before -- is it correct that the output table does not reflect the actual regression coefficients when the covariates() option is used?? I think is absolutely essential to get the adjusted coefs here, and IMO this should by default -- as with the other options, I am happy to suggest we deprecate and remove the covariates() option so it works with past code but is not noted for new users (and, should return a warning). I have used, for example, rhs() for such an option.

It does not affect the value displayed in the output if stats(pair(diff)) is used (which is the default) as covariates does not change the real difference between the two groups on the ground. And the most basic usage of a balance table is to describe the difference between the two groups and test if the difference is significant. And for the most basic usage stats(pair(diff)) makes it straightforward to say what the number in the table is.

If you use covariates and stats(pair(diff)) then the test if the treatment statistically significant explains the difference in means has changed, however, the difference between the groups has not changed in reality and not in the table.

When you are using no covariates or fixed effects then there is no difference between stats(pair(diff)) and stats(pair(beta)). But when you do use covariates or fixed effects there is a difference. But it becomes confusing for a layman to know what the value in the table means if it is the beta coefficient that is displayed.

Both are used, and stats(pair(beta)) might be more appropriate for an academic paper, but since the meaning of stats(pair(diff)) is more straightforward, then that is used. Although, this is mostly a reason for keeping it, the that default is the thing least technically explained. But the main reason it is like this is due to this being the preference of my TTL at the time of writing this code).

May 09 '22 11:05 kbjarkefur

ietoolkit ietoolkit copied to clipboard

[iebaltab] March 2022 rewrite review

ietoolkit
ietoolkit copied to clipboard