stata-economics
stata-economics copied to clipboard
E03: provide guidance on variable naming
- Box 'Variable names':
- the discussed rules for variable names are not exhaustive since variable names also must not be any of the reserved names (see Stata Manual Section 11.3)
- I would propose to introduce some naming convention for variables so that students already start off with a good practice (for example only use English words, avoid abbreviation, use only lower case characters etc.); this could be a good chance to stress the importance of being consistent, whatever the rule/system is that they may be following
- In my opinion, special characters should show up nowhere in a project (except for the raw data on which we have no influence) for reasons of backward compatibility, cross-platform compatibility and the understanding of colleagues/referees/editors who don't speak the language.
- legibility = readability?
Agreed. On the naming convention, some advice would be to:
- Stick to ASCII as much as possible. Recent versions of Stata allow UTF8 (
gen fóßé = 2
) but that makes collaboration and debugging difficult, as you said - Use snake case (lowercase words separated with underscores) instead of camel case or other alternatives that have been shown harder to read in coding.
- I can't remember where I read this, but Statacorp suggests general-to-specific naming. For instance,
nominal_gdp
andreal_gdp
instead ofgdp_nominal
andgdp_real
. - Lastly, although you can have up to 32 chars, it's best not to have too long names because that increase reliance on abbreviations, and I've seen my share of bugs caused by these (in fact, nowadays I always start with
set varabbrev off
, at the dismay of coauthors).
Maybe one way of showing these is by examples? Say we have nominal and real GDP as in above. A table would be a way of showing-not-telling:
Definition | Nominal GDP | Real GDP |
---|---|---|
Suggested | nominal_gdp | real_gdp |
Too short | ngdp | rgdp |
Too long | nominal_gross_domestic_product | real_gross_domestic_product |
Less readable: CamelCase | NominalGDP / NominalGdp | RealGDP / RealGdp |
Less readable: general-to-specific | gdp_nominal | gdp_real |
Less portable: UTF8 | pib_nominale | pib_réel |