odkmeta
odkmeta copied to clipboard
Option for field short names
The help file already offers code that attempts to rename variables to their field short names:
foreach var of varlist _all {
if "`:char `var'[Odk_group]'" != "" {
local name = "`:char `var'[Odk_name]'" + ///
cond(`:char `var'[Odk_is_other]', "_other", "") + ///
"`:char `var'[Odk_geopoint]'"
local newvar = strtoname("`name'")
capture rename `var' `newvar'
}
}
However, it'd be nice to add this as an odkmeta
option (short
?) such that the naming is very robust. There's also a discrepancy now between the variable names chosen by insheet
and those produced by strtoname()
in the loop: given the same column header, insheet
and strtoname()
may result in different variable names.
Hey Matt,
As you look in to the PR I sent previously, a couple of items to note with regards to how SurveyCTO currently exports field names. As of SurveyCTO 2.0 there are three options for exporting of fieldnames.
- Add groups to exported field names ("groupname-fieldname")
- Exclude groups, but still use them internally.
- Ignore groups so fields with the same name export together.
The first, of course, was the default going way back and is how odkmeta assumes the data to be organized. The second is meant as SCTO's way of circumventing Stata's 32 character limit for variable names. The third is meant to help streamline output in the case that a field is removed from a grouping mid-survey, but can be dangerous if the user has many fields with the same name.
Right now a lot of users are opting to use the second option and the SCTO generated do-files to circumvent the 32 character limit. If we include a shortnames
option in the next release of odkmeta
, then we might want to account for both the case that the user starts with data exported in format 1 and the case that they have exported it using option 2. However, I know that the latter case may interfere with the way that odkmeta expects the csv to be organized based on reading the XLSForm. So I wanted to gauge your thoughts about how best to proceed. I am happy to start pulling more programming weight going forward as I know your time is limited.
Chris
I remember discussing this with Lindsey back in the day, and I think we concluded that as long as it's easy for users to select option 1, we don't need to support .csv files with short-name column headers. Any change to the initial .csv import (implemented in ``ODKMetaController'::write_fields()`) is usually a fairly heavy lift, so I'd shy away from supporting option 2 unless it's a priority. I also think it could be confusing to have two short name options, one for importing .csv files with short names and one for importing .csv files with long names and then converting them to short names. What do you think?
That sounds fine. Alternatively, would it be easy to capture case 2 and have odkemta
display an intelligible error message telling the user to export using the group-names option? If not we can just make sure our messaging is on point.
Nice idea, that definitely sounds possible! For .csv files that include fields inside groups, maybe we could check the column headers and see if they include the group delimiter (-
), and if not, issue an error message. That would also help catch users trying to use odkmeta
with unsupported flavors that use a group delimiter other than -
.
I'd recommend implementing these two features in separate PRs. I could review the shortnames
PR in the current sprint, then review the messaging PR in the next one.