Tangerine CSV outputs show separate/duplicate columns if a variable moves from one section to another due to a form change

Issue

If a change is made to a form that moves a variable to a different section, then the CSV output will show that variable in two columns. This is not necessarily a bug, but Tangerine should handle this scenario.

CSV outputs are designed to show variables by section so they follow the data dictionary. Changes to the structure of the sections and variables in different versions of the form will change the order of the headers in the CSV outputs.

Example

Form version one has the variable held in the section Sought

{
  "_id": "abc",
  "formId": "form-1",
  "formVersion": "v1",
  "form-06b414cc-4971-46da-b121-fd3362e8d1f6.item_Sought.held": "0",
}

Form version two has the variable held in the section Crime

{
  "_id": "def",
  "formId": "form-1",
  "formVersion": "v1",
  "form-06b414cc-4971-46da-b121-fd3362e8d1f6.item_Crime.held": "0",
}

The CSV output for this form will be:

_id	formId	formVersion	item_Sought_disabled	held	item_Crime_disabled	held
abc	form1	v1	FALSE	0	FALSE	UNDEFINED
def	form1	v2	FALSE	UNDEFINED	FALSE	0

Considerations

Solutions to the issue will need to consider how to implicitly infer a form version from the csv-reporting metadata
- The form versioning feature of Tangerine is usually implemented since there is no UI. A solution
- The form version could be assumed using git history of the form file
Solutions will also need to consider the impact on the ordering of sections and variables in the outputs
- Simply combining the variable into one column breaks the current order of the variables into csvs
MySQL outputs do not have this issue since duplicate variable are not allowed

Possible Solutions

Add a UI option to output CSVs by version. One CSV file per Form Version
Add a UI option to output CSVs as a distict set of variables (instead of in data dictionary order)

Oct 25 '23 18:10 esurface

@esurface - are you certain this is in fact the current behavior? I don't think my experience has reflected this. (The same varname showing up in multiple columns.)

Oct 25 '23 19:10 TSSlade

Also wanted to confirm - the illustrative JSON for the second block still says "formVersion": "v1", while the illustrative CSV output says formVersion is v2. Is that a typo, or are you suggesting that there be some manner of auto-incrementing happening? (I'm assuming the former - I don't think an 'automagical' auto-increment would be the ideal way to go.)

In re: "breaking the order of variables into CSVs" - this is already somewhat broken, in that late-added variables get appended to the end of the CSV column list rather than actually being inserted alongside their neighbors in the instrument proper.

For instance, if I've generated data for an instrument having SectionA.item1-SectionA.item10, SectionB.item1-SectionB.item10, and SectionC.item1-SectionC.item10 in that order, and then I add the variable SectionA.item11, that new variable will wind up as the 31st item in the column list rather than the 11th. (Ignoring all the metadata columns for the purposes of this example.)

If you want to fix that, that would be cool. But the current reality doesn't seem to match what you're describing under bullet 2.

Oct 26 '23 12:10 TSSlade

Tangerine Tangerine copied to clipboard

CSV outputs show separate/duplicate columns if a variable moves from one section to another due to a form change

Issue

Example

Considerations

Possible Solutions

Tangerine
Tangerine copied to clipboard