ietoolkit
ietoolkit copied to clipboard
[iebaltab] March 2022 rewrite review
General:
- [x] Rename
main
branch - [x]
run-old-versions.md
link in deprecation error message 404s
Help file / Options:
- [x] Output options descriptions need to be updated (
save
,browse
) - [x] The meaning of
grpcodes
is unclear -- I cannot determine what it does - [x] "Statistics" options should be listed right after the mandatory options, then, separately, the "data treatment" ones. These are much more important than labelling!
- [x] "F-test" options can be added to the statistics options. What are missing values in F tests?
fnoobs
should be with the display options. - [x] What does
missminmean()
do? - [x]
pairoutput
option accepts multiple inputs, such aspairoutput(beta t)
, which I want to do, but then causes error and returns:option pout_lbl() required
without explanation. Recommend to also allowse
andci
.
Math:
- [x] Default set of comparison tests (all vs all) is good
- [x] ⚠️ I cannot quite understand what
ftest
is doing -- is this the omnibus test? If so I would enable it by default. - [x] ⚠️ SEs and SDs for groups do not display
- [x]
pairoutput
option could be required, which would remove any issue of default functionality? It could also be called something more intuitive likeStats()
- [x] ⚠️ I am not sure that the
nrmd
(normalized differences) are correctly calculated. See this link:
The normalized differences approach of Imbens and Rubin (2015) is useful here. This is defined as the difference in means between the treatment and control groups, divided by the square root of half the sum of the treatment and control group variances
- [ ]
control()
option should flip all regression signs (and subtraction order) - [x] New
treatment()
option should replace currentcontrol()
option functionality - [x] There is no need for a separate
fixedeffect()
option as long as indicatori.
variables can be appropriately included in thecovariates()
option - [x] What does
covarmissok
do? Could this potentially be combined with the requestedcasewise
option somehow, since it appears conceptually similar? - [x] I would remove the
balmiss()
,balmissreg()
,covmiss()
, andcovmissreg()
options, or at least move them somewhere else ("Advanced"?)... this is quite dangerous....
Display:
- [x]
rowvarlabels
behavior should be the default, with the option to disable - [x] Table notes should explain the meaning of non-numerical results like
.v
. - [x] The default paired t-tests should also report their SEs by default, with the option of instead producing P-values and, possibly, confidence intervals. These options need to be updated in the help file in any case:
The options nottest, normdiff, pttest, pftest and pboth have been deprecated as of version 7 of iebaltab. See if the
options pairoutput or ftestoutput have the functionality you need. If you still need the old version of iebaltab,
for example for reproduction of old code, see this guide for how to run old versions of any command in the ietoolkit
package.
Misc:
- [x]
onerow
returns an unhelpful error if the number of observations varies across regressions. The follow, for example, occurs, but the number of obs in the regressions is not reported in the table without the command, so the source of the failure can't be found.
iebaltab price mpg rep78 , grpv(rep78) savexlsx("test.xlsx") replace control(2) total grpc onerow
Option onerow may only be used if the number of observations with non-missing values are the same in all groups
across all balance variables. This is not true for group(s): [2_1 2_3 2_4 2_5]. Run the command again without this
options to see in which column the number of observations is not the same for all rows.
SE and SD went missing when I capitalized the titles. It is fixed in 33117f7398c2bfe92caadefed6619c9fd947d8eb
Yes, ftest
does that test. This is still accurate in the old helpfile:
reg testgroupdummy balancevarlist
testparm balancevarlist
I prefer to keep it an option for backward compatibility reasons and that it is cleaner IMO to add options for things you add instead of add options to remove things. I can also see someone who does not like f-test come with feedback saying that we should not promote this or that test.
I like stats()
. And then it would be specified like stats(pair(t) f(p))
The options missminmean()
, covarmissok
, balmiss()
, balmissreg()
, covmiss()
, and covmissreg()
are among the options I said will be deprecated. They are soon gone so dont worry about them 😃
I like
stats()
. And then it would be specified likestats(pair(t) f(p))
I don't think it even needs the sub-functions -- if it always takes two arguments then it must be some combination of the specification for the estimate and the specification for the uncertainty. ie, stats(mean se)
or stats(beta ci)
is completely unambiguous
I don't think it even needs the sub-functions -- if it always takes two arguments then it must be some combination of the specification for the estimate and the specification for the uncertainty. ie, stats(mean se) or stats(beta ci) is completely unambiguous
Preference can and are often different across tests and, for example, ci
does not work on an F test. It doesn't make sense to me to replace pairoutput
with stats()
and then keep ftestoutput()
and feqtestoutput()
.
The syntax will be stats(pair(string) f(string) feq(string))
where those three strings allows different input.
I can see group()
being added in a future version.
"F-test" options can be added to the statistics options.
No due to backward compatibility
What are missing values in F tests?
I assume that this is comment in relation to fmissok
that is now a deprecated option
noobs
should be with the display options.
done in 11d283bdc83d1dab88a2ba02ff37e69c15c94c5b
Default set of comparison tests (all vs all) is good
I do not understand what this means
pairoutput option accepts multiple inputs, such as pairoutput(beta t)
We decided in the meeting that this is not part of this version
causes error and returns:
option pout_lbl() required
without explanation
This option is replaced by stats()
Recommend to also allow
se
andci
.
Done for stats(pair(se))
and stats(pair(sd))
in f9b2a267bfe8a1b1c5a8b303aff4fe80979a54cb
ci
has two values which require bigger changes. Possible to implement but out of scope in the re-write.
pairoutput
option could be required, which would remove any issue of default functionality?
No due to backward compatibility
It could also be called something more intuitive like Stats()
Done in d3705baa7f8b58683273a686aa8531803b8c543d
I am not sure that the nrmd (normalized differences) are correctly calculated. See this link:
@bbdaniels, can you give me a correct formula? Preferable in terms of Stata code? I understand a text like this, but I need your help to translate this into code at the level of accuracy a big command like this requires
New
treatment()
option should replace currentcontrol()
option functionality
Hesitant to remove control()
due to backward compatibility, but if you elaborate on the benefit of a new option treatment()
is and why it must replace control()
I am happy to consider it
There is no need for a separate
fixedeffect()
option as long as indicatori.
variables can be appropriately included in the covariates() option
I don't disagree if the command was new. I do not see an issue with fixedeffect()
big enough to break backward compatibility. Let me know if there is anything I am missing.
rowvarlabels
behavior should be the default, with the option to disable
Again, I don't disagree if the command was new, but do not see what would justify to break backward compatibility. Let me know if there is anything I am missing.
Table notes should explain the meaning of non-numerical results like
.v
Are you sure the table notes is the right place for this? Missing values should never end up in the table. And I see a big risk for confusion if the table note explains things that are found in other places rather then the table.
I was thinking of explaining this in the helpfile. Only people skilled enough to look in the helpfile would be skilled enough to use return list
to access a martix.
The default paired t-tests should also report their SEs by default, with the option of instead producing P-values and, possibly, confidence intervals.
Risking to sound like a record on repeat, I think backward compatibility is important so I am reluctant to change default behavior.
Multiple stats for the pair tests is out of the scope for this re-write.
These options need to be updated in the help file in any case:
The first section of the helpfile is updated in 3a0a9f8fdfd1306a32412deb22241fc1f216405d The rest of the helpfile will be updated once we have finalized the code of the command
onerow
returns an unhelpful error if the number of observations varies across regressions.
Agreed, I will implement
There is no need for a separate
fixedeffect()
option as long as indicatori.
variables can be appropriately included in the covariates() optionI don't disagree if the command was new. I do not see an issue with
fixedeffect()
big enough to break backward compatibility. Let me know if there is anything I am missing.
I don't think it needs to break anything -- just deprecate it and remove the option from the help file on this version. It will still work backwards, but people should prefer to use the modern syntax directly as a covariate -- for example, things like two-way fixed effects might require explicit regression specs. I don't think you need to do any more coding to make that work correctly since it just passes through the string, but I will test and check.
New
treatment()
option should replace currentcontrol()
option functionalityHesitant to remove
control()
due to backward compatibility, but if you elaborate on the benefit of a new optiontreatment()
is and why it must replacecontrol()
I am happy to consider it
I think this would have no impact other than flipping the signs, which I understand is not ideal, but it would align better with the normal meaning of "treatment" and "control". So treatment()
would do what control()
does now, which is present the treatment effect for one group relative to all other groups. And control()
would do the opposite, which is present treatment effects for all groups relative to the one specified.
Alternatively, to avoid breaking, control()
could be deprecated and hidden -- that is, just pass through the treatment
argument to the existing functionality so as not to make maintenance harder, and use a new term like base()
for the option that flips the comparison. I would think that would not be terribly difficult and would preserve both backwards compatibility and maintainability?
Default set of comparison tests (all vs all) is good
I do not understand what this means
I just mean this is a good default and I like that you have done it. No need for any changes ☺️
"F-test" options can be added to the statistics options.
No due to backward compatibility
What are missing values in F tests?
I assume that this is comment in relation to
fmissok
that is now a deprecated option
noobs
should be with the display options.done in 11d283b
Sorry -- the first comment was only to refer to the help-file organization, not the functionality. I will suggest improvements there in commits on my next run-through.
I am not sure that the nrmd (normalized differences) are correctly calculated. See this link:
@bbdaniels, can you give me a correct formula? Preferable in terms of Stata code? I understand a text like this, but I need your help to translate this into code at the level of accuracy a big command like this requires
Yes, I will do this. Edit: Done. But needs resolution on comment below about which difference will be used. Should be the adjusted difference!
The correct formula will be:
abs(`pair_difference') / sqrt((`variance_1'+`variance_2')/2)
Table notes should explain the meaning of non-numerical results like
.v
Are you sure the table notes is the right place for this? Missing values should never end up in the table. And I see a big risk for confusion if the table note explains things that are found in other places rather then the table.
I was thinking of explaining this in the helpfile. Only people skilled enough to look in the helpfile would be skilled enough to use
return list
to access a martix.
Hmm, I'm not sure why I was getting them then. I'll test again and see what I'd done and let you know.
One more big thing that I somehow missed before -- is it correct that the output table does not reflect the actual regression coefficients when the covariates()
option is used?? I think is absolutely essential to get the adjusted coefs here, and IMO this should by default -- as with the other options, I am happy to suggest we deprecate and remove the covariates()
option so it works with past code but is not noted for new users (and, should return a warning). I have used, for example, rhs()
for such an option.
Ok I am trying to test the stats()
stuff but I can't find direct instructions. When I tried normd
as in the help file I get:
The options nottest, normdiff, pttest, pftest and pboth have been deprecated as of version 7 of iebaltab. See if the options pairoutput or ftestoutput have the functionality you need. If you still need the old version of iebaltab, for example for reproduction of old code, see this guide for how to run old versions of any command in the ietoolkit package.
But there is no pairoutput
since it is stats()
now, right? The stats_string
definitions don't seem to be in the help file yet so I can't test thoroughly, but I can back it out from the parser and use:
iebaltab trunk weight length , grpvar(rep78) savex("/users/bbdaniels/desktop/test.xlsx") covar(price) replace stats(pair(nrmd))
Then I get normalized differences for all comparisons. For these, we will replace by the formula above. I don't think these use significance stars in any case so those will drop as well -- the magnitude should be the only metric reported.
There is no need for a separate
fixedeffect()
option as long as indicatori.
variables can be appropriately included in the covariates() optionI don't disagree if the command was new. I do not see an issue with
fixedeffect()
big enough to break backward compatibility. Let me know if there is anything I am missing.I don't think it needs to break anything -- just deprecate it and remove the option from the help file on this version. It will still work backwards, but people should prefer to use the modern syntax directly as a covariate -- for example, things like two-way fixed effects might require explicit regression specs. I don't think you need to do any more coding to make that work correctly since it just passes through the string, but I will test and check.
The two tables blow was created using syntax iebaltab weight price , grpvar(tmt_cl) [...] cov(mpg) fixedeffect(foreign) stats(pair(beta))
and iebaltab weight price , grpvar(tmt_cl) [...] cov(mpg i.foreign) stats(pair(beta))
and are identical. So the functionality you are suggesting exists (although by luck). Please let me know if you see a reason to still remove fixedeffect()
that justify it breaking backward compatibilty.
New
treatment()
option should replace currentcontrol()
option functionalityHesitant to remove
control()
due to backward compatibility, but if you elaborate on the benefit of a new optiontreatment()
is and why it must replacecontrol()
I am happy to consider itI think this would have no impact other than flipping the signs, which I understand is not ideal, but it would align better with the normal meaning of "treatment" and "control". So
treatment()
would do whatcontrol()
does now, which is present the treatment effect for one group relative to all other groups. Andcontrol()
would do the opposite, which is present treatment effects for all groups relative to the one specified.Alternatively, to avoid breaking,
control()
could be deprecated and hidden -- that is, just pass through thetreatment
argument to the existing functionality so as not to make maintenance harder, and use a new term likebase()
for the option that flips the comparison. I would think that would not be terribly difficult and would preserve both backwards compatibility and maintainability?
I agree with your high level point and is softening up to this. However, where I have yet to understand what you envision is in the details.
If iebaltab
were new I think control()
would be the best option and it would take the integer of the value in the variable in grpvar()
that is control. And then the signs would be flipped so a positive value in the pairwise test column equals a positive effect in that treatment vs. control.
In a multiarm treatment situation, what do you envision treatment()
to take as values? Let's say grpvar()
has values 0
, 1
, and 2
, where 0
is the control group. Should the user then specify treatment(1 2)
in your vision?
I am not sure that the nrmd (normalized differences) are correctly calculated. See this link:
@bbdaniels, can you give me a correct formula? Preferable in terms of Stata code? I understand a text like this, but I need your help to translate this into code at the level of accuracy a big command like this requires
Yes, I will do this. Edit: Done. But needs resolution on comment below about which difference will be used. Should be the adjusted difference!
The correct formula will be:
abs(`pair_difference') / sqrt((`variance_1'+`variance_2')/2)
Where `pair_difference'
I know where to get. But where, expressed in Stata specifics, would you get `variance_1'
and `variance_2'
? Can you point to something in return list
or ereturn list
following regress
I can use for this?
Table notes should explain the meaning of non-numerical results like
.v
Are you sure the table notes is the right place for this? Missing values should never end up in the table. And I see a big risk for confusion if the table note explains things that are found in other places rather then the table. I was thinking of explaining this in the helpfile. Only people skilled enough to look in the helpfile would be skilled enough to use
return list
to access a martix.Hmm, I'm not sure why I was getting them then. I'll test again and see what I'd done and let you know.
You are expected to get them in r(iebaltabrmat)
and r(iebaltabfmat)
, but please send any reproducible example where they end up in the table my way. That would be interesting to look into.
One more big thing that I somehow missed before -- is it correct that the output table does not reflect the actual regression coefficients when the
covariates()
option is used?? I think is absolutely essential to get the adjusted coefs here, and IMO this should by default -- as with the other options, I am happy to suggest we deprecate and remove thecovariates()
option so it works with past code but is not noted for new users (and, should return a warning). I have used, for example,rhs()
for such an option.
It does not affect the value displayed in the output if stats(pair(diff))
is used (which is the default) as covariates does not change the real difference between the two groups on the ground. And the most basic usage of a balance table is to describe the difference between the two groups and test if the difference is significant. And for the most basic usage stats(pair(diff))
makes it straightforward to say what the number in the table is.
If you use covariates and stats(pair(diff))
then the test if the treatment statistically significant explains the difference in means has changed, however, the difference between the groups has not changed in reality and not in the table.
When you are using no covariates or fixed effects then there is no difference between stats(pair(diff))
and stats(pair(beta))
. But when you do use covariates or fixed effects there is a difference. But it becomes confusing for a layman to know what the value in the table means if it is the beta coefficient that is displayed.
Both are used, and stats(pair(beta))
might be more appropriate for an academic paper, but since the meaning of stats(pair(diff))
is more straightforward, then that is used. Although, this is mostly a reason for keeping it, the that default is the thing least technically explained. But the main reason it is like this is due to this being the preference of my TTL at the time of writing this code).