odkmeta icon indicating copy to clipboard operation
odkmeta copied to clipboard

Option for field short names

Open matthew-white opened this issue 10 years ago • 4 comments

The help file already offers code that attempts to rename variables to their field short names:

foreach var of varlist _all {
    if "`:char `var'[Odk_group]'" != "" {
        local name = "`:char `var'[Odk_name]'" + ///
            cond(`:char `var'[Odk_is_other]', "_other", "") + ///
            "`:char `var'[Odk_geopoint]'"
        local newvar = strtoname("`name'")
        capture rename `var' `newvar'
    }
}

However, it'd be nice to add this as an odkmeta option (short?) such that the naming is very robust. There's also a discrepancy now between the variable names chosen by insheet and those produced by strtoname() in the loop: given the same column header, insheet and strtoname() may result in different variable names.

matthew-white avatar May 25 '14 22:05 matthew-white

Hey Matt,

As you look in to the PR I sent previously, a couple of items to note with regards to how SurveyCTO currently exports field names. As of SurveyCTO 2.0 there are three options for exporting of fieldnames.

  1. Add groups to exported field names ("groupname-fieldname")
  2. Exclude groups, but still use them internally.
  3. Ignore groups so fields with the same name export together.

The first, of course, was the default going way back and is how odkmeta assumes the data to be organized. The second is meant as SCTO's way of circumventing Stata's 32 character limit for variable names. The third is meant to help streamline output in the case that a field is removed from a grouping mid-survey, but can be dangerous if the user has many fields with the same name.

Right now a lot of users are opting to use the second option and the SCTO generated do-files to circumvent the 32 character limit. If we include a shortnames option in the next release of odkmeta, then we might want to account for both the case that the user starts with data exported in format 1 and the case that they have exported it using option 2. However, I know that the latter case may interfere with the way that odkmeta expects the csv to be organized based on reading the XLSForm. So I wanted to gauge your thoughts about how best to proceed. I am happy to start pulling more programming weight going forward as I know your time is limited.

Chris

boyercb avatar Mar 02 '16 19:03 boyercb

I remember discussing this with Lindsey back in the day, and I think we concluded that as long as it's easy for users to select option 1, we don't need to support .csv files with short-name column headers. Any change to the initial .csv import (implemented in ``ODKMetaController'::write_fields()`) is usually a fairly heavy lift, so I'd shy away from supporting option 2 unless it's a priority. I also think it could be confusing to have two short name options, one for importing .csv files with short names and one for importing .csv files with long names and then converting them to short names. What do you think?

matthew-white avatar Mar 04 '16 19:03 matthew-white

That sounds fine. Alternatively, would it be easy to capture case 2 and have odkemta display an intelligible error message telling the user to export using the group-names option? If not we can just make sure our messaging is on point.

boyercb avatar Mar 04 '16 19:03 boyercb

Nice idea, that definitely sounds possible! For .csv files that include fields inside groups, maybe we could check the column headers and see if they include the group delimiter (-), and if not, issue an error message. That would also help catch users trying to use odkmeta with unsupported flavors that use a group delimiter other than -.

I'd recommend implementing these two features in separate PRs. I could review the shortnames PR in the current sprint, then review the messaging PR in the next one.

matthew-white avatar Mar 04 '16 19:03 matthew-white