stata-economics icon indicating copy to clipboard operation
stata-economics copied to clipboard

E03: provide guidance on variable naming

Open korenmiklos opened this issue 4 years ago • 1 comments

  • Box 'Variable names':
    • the discussed rules for variable names are not exhaustive since variable names also must not be any of the reserved names (see Stata Manual Section 11.3)
    • I would propose to introduce some naming convention for variables so that students already start off with a good practice (for example only use English words, avoid abbreviation, use only lower case characters etc.); this could be a good chance to stress the importance of being consistent, whatever the rule/system is that they may be following
  • In my opinion, special characters should show up nowhere in a project (except for the raw data on which we have no influence) for reasons of backward compatibility, cross-platform compatibility and the understanding of colleagues/referees/editors who don't speak the language.
  • legibility = readability?

korenmiklos avatar Jul 14 '20 19:07 korenmiklos

Agreed. On the naming convention, some advice would be to:

  • Stick to ASCII as much as possible. Recent versions of Stata allow UTF8 (gen fóßé = 2) but that makes collaboration and debugging difficult, as you said
  • Use snake case (lowercase words separated with underscores) instead of camel case or other alternatives that have been shown harder to read in coding.
  • I can't remember where I read this, but Statacorp suggests general-to-specific naming. For instance, nominal_gdp and real_gdp instead of gdp_nominal and gdp_real.
  • Lastly, although you can have up to 32 chars, it's best not to have too long names because that increase reliance on abbreviations, and I've seen my share of bugs caused by these (in fact, nowadays I always start with set varabbrev off, at the dismay of coauthors).

Maybe one way of showing these is by examples? Say we have nominal and real GDP as in above. A table would be a way of showing-not-telling:

Definition Nominal GDP Real GDP
Suggested nominal_gdp real_gdp
Too short ngdp rgdp
Too long nominal_gross_domestic_product real_gross_domestic_product
Less readable: CamelCase NominalGDP / NominalGdp RealGDP / RealGdp
Less readable: general-to-specific gdp_nominal gdp_real
Less portable: UTF8 pib_nominale pib_réel

sergiocorreia avatar Aug 04 '20 16:08 sergiocorreia