ietoolkit icon indicating copy to clipboard operation
ietoolkit copied to clipboard

iebaltab rewrite kb

Open kbjarkefur opened this issue 1 year ago • 10 comments

Please provide comments in this PR

kbjarkefur avatar Sep 08 '22 14:09 kbjarkefur

Ok I have looked through this now and it looks hugely improved, great work! I just have one last question, which is, how can I request a mean and a variance statistic for the pairwise comparisons at the same time (ie, beta and p)? That is the only thing I would want to do that I can't figure out and there appears to be room in the table for that where there are currently blank spaces.

For example, I run:

iebaltab pop medage, grpvar(region) savex("/users/bbdaniels/desktop/test.xlsx") replace stats(pair( beta ))

And get:

Screen Shot(17)

But this fails, so it is not clear how to fill the blanks (as in K5):

.  iebaltab pop medage, grpvar(region) savex("/users/bbdaniels/desktop/test.xlsx") replace stats(pair( beta se))

    The [pair(beta se)] in [stats(pair( beta se))] is not formatted properly. Make sure that each test name is follewed by a single stat
        inside a ().
invalid syntax
r(198);

bbdaniels avatar Sep 19 '22 20:09 bbdaniels

how can I request a mean and a variance statistic for the pairwise comparisons at the same time (ie, beta and p)?

This is not supported yet. This would be neat but somewhere I had to draw the line to to get it out the door at some point. But how I set up the stats() option had this in mind so that it would later not be too much work to add.

Here is issue to track this if you want to add some more comments.

kbjarkefur avatar Sep 20 '22 07:09 kbjarkefur

I will not hold this up further! That seems like a useful and straightforward feature to add next. If you are happy with the functionality then it is good from my side!

bbdaniels avatar Sep 20 '22 18:09 bbdaniels

Writing down thoughts as I test this:

  • [x] The browse option should be abbreviated to br
  • [x] Significance levels should read 0.01, 0.05 and 0.1 instead of 1%, 5%, 10%
  • [x] Why does the csv file have quotation marks around the numbers? Can we remove it? There are also spaces around the numbers.
  • [ ] The sign of the difference in means is not changing when we change the base group using option control()
  • [x] add option to make "Total" the first column in the table
  • [x] make option onerow the default when the number of observations is not changing
  • [x] use of option format() returns error "too few quotes"
  • [x] (FOR THE FUTURE) when using the option stats() for the pairwise comparison, one may want to show a combination of stats. in particular, one may want to show the beta + se, or the diff + p-value. I think this was a particular request we received and may be useful to think about how to implement it

luizaandrade avatar Oct 11 '22 23:10 luizaandrade

Why does the csv file have quotation marks around the numbers? Can we remove it? There are also spaces around the numbers.

Cells with labels used for row or col headers using , needs to be enclosed in "" to not break the comma delimitator. I am not adding "" manually but telling Stata to add it to all values when exporting. It will be some work to do this manually so that it only applies to data points that are labels. What is a scenario where this is an actual issue? If there is a good reason we can work on adding "" manually.

kbjarkefur avatar Oct 13 '22 16:10 kbjarkefur

add option to make "Total" the first column in the table

How important? Can this be pushed to a later version? I am keen to get this version out, then we can add new features that we want.

kbjarkefur avatar Oct 13 '22 16:10 kbjarkefur

make option onerow the default when the number of observations is not changing

I vote no for backward compatibility. We have broken backward compatibility in this version for some cases where it is a really good case for it. I do not think this is as important.

kbjarkefur avatar Oct 13 '22 16:10 kbjarkefur

(FOR THE FUTURE) when using the option stats() for the pairwise comparison, one may want to show a combination of stats. in particular, one may want to show the beta + se, or the diff + p-value. I think this was a particular request we received and may be useful to think about how to implement it

Absolutely. And the stats() option is set up with that in mind, but Ben and I decided it did not fit in the scope of this version. It is tracked here: https://github.com/worldbank/ietoolkit/issues/293

kbjarkefur avatar Oct 13 '22 16:10 kbjarkefur

The browse option should be abbreviated to br

Done in 10bcb20e95d679156f04968974e885449664b6a8

kbjarkefur avatar Oct 13 '22 16:10 kbjarkefur

Significance levels should read 0.01, 0.05 and 0.1 instead of 1%, 5%, 10%

Done in fc167cc6a7b176f5e3717308d99c2aa032d387bf

kbjarkefur avatar Oct 13 '22 16:10 kbjarkefur

add option to make "Total" the first column in the table

How important? Can this be pushed to a later version? I am keen to get this version out, then we can add new features that we want.

Yeah, this can come later. It's just a bit counter intuitive, but nothing major. Moved it to issue #298

luizaandrade avatar Nov 01 '22 15:11 luizaandrade

make option onerow the default when the number of observations is not changing

I vote no for backward compatibility. We have broken backward compatibility in this version for some cases where it is a really good case for it. I do not think this is as important.

My vote would be to do it now. If we're already breaking backward compatibility, better to make all changes we want to that will break it now. That's a weak preference, though. @bbdaniels , what do you think?

luizaandrade avatar Nov 01 '22 15:11 luizaandrade

Why does the csv file have quotation marks around the numbers? Can we remove it? There are also spaces around the numbers.

Cells with labels used for row or col headers using , needs to be enclosed in "" to not break the comma delimitator. I am not adding "" manually but telling Stata to add it to all values when exporting. It will be some work to do this manually so that it only applies to data points that are labels. What is a scenario where this is an actual issue? If there is a good reason we can work on adding "" manually.

It's not a big issue at all. I just find it harder to read the changes on GitHub. But it's super small and not worth spending a lot of time.

luizaandrade avatar Nov 01 '22 15:11 luizaandrade

use of option format() returns error "too few quotes" Fixed in 161309cc88d61bc72e07cbb1576e0592f5091ee2

kbjarkefur avatar Nov 03 '22 11:11 kbjarkefur

make option onerow the default when the number of observations is not changing

I vote no for backward compatibility. We have broken backward compatibility in this version for some cases where it is a really good case for it. I do not think this is as important.

My vote would be to do it now. If we're already breaking backward compatibility, better to make all changes we want to that will break it now. That's a weak preference, though. @bbdaniels , what do you think?

I am always in favor of breaking compatibility for big functionality upgrades at major versions! But I don't think this is supposed to be a major version -- primarily a rewrite so that we can do that in the future? So in this case I would say keep things backward compatible and don't have a major version, then let's carry over a list of compatibility-breaking functionality and syntax changes for the next version.

bbdaniels avatar Nov 03 '22 14:11 bbdaniels

add option to make "Total" the first column in the table

How important? Can this be pushed to a later version? I am keen to get this version out, then we can add new features that we want.

Yeah, this can come later. It's just a bit counter intuitive, but nothing major. Moved it to issue #298

Fixed in 851a9c25888ef1711ec875c694531887432473e9

kbjarkefur avatar Nov 03 '22 14:11 kbjarkefur

make option onerow the default when the number of observations is not changing

Wont implement due to this argument made on Slack:

I am still not convinced this is the best way to got. Lets say all my columns has the same N. With the new default I would only have my Ns on a single row at the bottom. If then something changes to my data so one data point is missing. Then I would not get an error or a warning, instead the structure of my table would tacitly change. Is it only me who sees that as concerning or confusion at best. I still vote for keeping it as it is. As in the current default, the layout of the table does not change as the data change. And by adding the option onerow you add a requirement that the data should be in a certain way (i.e. N the same in a column across all rows). For those that know their data will always meet this condition they can run iebaltab with onerow always there, just like all of us who know we want to replace the already existing file run iebaltab with replace.

kbjarkefur avatar Nov 03 '22 15:11 kbjarkefur

From looking at the compiled latex file and comparing with the current SSC version, all the estimates are consistent, but the display still has some issues. v6.4.pdf v6.3.pdf

  • [x] significance stars are not being exported, even when the stars option is explicitly called
  • [x] can we leave the double bars at the bottom of the table?
  • [x] can we also leave the extra spacing between variables? I find that easier to read
  • [x] SEs are not in parenthesis
  • [x] can we add a horizontal line before number of obs if using one row?
  • [x] can we add an option to not show the number of obs for the pairwise tests?

luizaandrade avatar Nov 07 '22 17:11 luizaandrade

SEs are not in parenthesis

Fixed in 3cef80d80d81b57704b10062cb2065e7b9c07e4c

kbjarkefur avatar Nov 08 '22 10:11 kbjarkefur

significance stars are not being exported, even when the stars option is explicitly called

There is no option called stars, you should have received an error in the new version with that option. Anyways, there was a bug suppressing the stars in the tex tables, that is fixed in 3657647f1d44ca529a037178669f47cc67f396ec

kbjarkefur avatar Nov 08 '22 10:11 kbjarkefur

can we leave the double bars at the bottom of the table? can we add a horizontal line before number of obs if using one row?

fixed in c6418bf4735943d8acd2502422f246a58e7ce604 - but lests discuss if there should be one or two single lines when both f-test and onerow are used

kbjarkefur avatar Nov 08 '22 11:11 kbjarkefur

can we also leave the extra spacing between variables? I find that easier to read

fixed in 0c5b7562d5cf0c88105c1cc32e742062021d4d85 - I also took the liberty to deprecate the option texvspace(string) that customize that space, but for every row. The whole point of returning results in matrices is for users to customize their tex tables outside iebaltab.

kbjarkefur avatar Nov 08 '22 11:11 kbjarkefur

can we add an option to not show the number of obs for the pairwise tests?

Suggest adding to an issue to get this version out

kbjarkefur avatar Nov 08 '22 11:11 kbjarkefur

significance stars are not being exported, even when the stars option is explicitly called

There is no option called stars, you should have received an error in the new version with that option. Anyways, there was a bug suppressing the stars in the tex tables, that is fixed in 3657647

Sorry, I meant the starlevels() option

luizaandrade avatar Nov 08 '22 15:11 luizaandrade