xforms-spec
xforms-spec copied to clipboard
Proposal: external CSV data support
Now that we have a way to add external XML data, I'd like to propose to extend this by adding a way to add external CSV data. This proposal aims to meet these 2 requirements:
- The CSV data URI is referred to in an XForm to 'load' it (to avoid magic).
- The data can be queried with the full power of XPath (to avoid requiring a secondary language to use CSV data).
I think we'd end up inventing a CSV query language replacement for XPath if we don't use XPath.
The first part could be met by adding a jr://file-csv
connector in the same style as jr://file
, jr://image
etc:
<instance id="households" src="jr://file-csv/households.csv"/>
The second part could be accomplished by defining a fixed transformation from CSV to XML. I'd like to propose to use the transformation that is identical to the one pyxform performs for the choices sheet when it creates a secondary XML instance for an <itemset>
which is as follows.
name | label | rooms |
---|---|---|
0001 | Johnson | 2 |
0034 | Doe | 5 |
The above "csv" (presented as a table) is transformed into the following XML (children of <instance id="..."/>
):
<root>
<item>
<name>0001</name>
<label>Johnson</label>
<rooms>2</rooms>
</item>
<item>
<name>0034</name>
<label>Doe</label>
<rooms>5</rooms>
</item>
</root>
P.S. whether these instances are dealt with internally as actual XML Documents or virtually e.g. a database table/document (as CommCare does) is up to the client and not part of the spec.
I'd like to expand on this by also providing a way to add translations to external data as follows:
name | label::English | label::Français | rooms |
---|---|---|---|
0001 | Johnson | Le Johnson | 2 |
0034 | Doe | Le Doe | 5 |
The above "csv" (presented as a table) is transformed into the following XML (children of <instance id="..."/>
):
<root>
<item>
<name>0001</name>
<label lang="English">Johnson</label>
<label lang="Français">Le Johnson</label>
<rooms>2</rooms>
</item>
<item>
<name>0034</name>
<label lang="English">Doe</label>
<label lang="Français">Le Doe</label>
<rooms>5</rooms>
</item>
</root>
Being able to use full XPath for CSV data we could support very complex CSV data queries, e.g.
- https://forum.opendatakit.org/t/pulldata-with-two-lookup-values/9435/6
- https://enketo.org/xforms/#csv-support (see bottom)
Yes, that would be very good indeed. A priori, the proposed syntax and approach seems good. Is it really necessary to have a different connector? Why not also use jr://file
and let the clients figure out how to process it based on the type?
@dcbriccetti and @mdudzinski, you've been thinking about external secondary instances in the context of the JR implementation. What do you think about this extension? The proposal to add a jr://file-csv
connector?
I haven’t read this issue yet, but I’ll mention now, for what it’s worth, that the preload data sample form linked to from here uses file-csv.
@dcbriccetti Can you say a little bit more about what you mean? JavaRosa does have a way to query arbitrary side loaded CSVs as described in the page you linked to but it doesn't allow for complex queries and the filename isn't included in the form which isn't very XForm-ish. That's why we're exploring an alternate approach that would be more in line with the rest of the specification.
On the JavaRosa side there could possibly be some overlap in the implementation though that will be a separate conversation. I'm not seeing file-csv
anywhere in the JavaRosa source code at the moment so I'm not totally sure what you're referring to.
@MartijnR does the Dimagi specification use the jr://file-csv
connector? I'm not finding it immediately.
I'm sorry, @dcbriccetti, I totally forgot that the preload implementation does indeed already use the jr://file-csv
connector. You're absolutely right. So that seems like it's definitely the right way to go and I apologize for confusing things. @MartijnR, do you have a quick recollection of the intention there? Is jr://file
meant to be used only for XML? Was there a particular reason to introduce different connectors for different file types? Was it to match jr://audio
, jr://images
, etc?
Now that I see jr://file-csv
already exists, the potentially contentious part of this proposal is enabling csv files to be queried with XPath expressions. Currently, pulldata
is used to pull values out of CSVs in a very simple way. @MartijnR described the problems with that here and the recent forum post that @MartijnR links to above shows that there is some user demand for more complex querying (though combining keys is a simple approach that can help in many contexts). The rest of that original thread about this is also insightful.
Based on all this, I'm in favor of making this an official part of the specification. I think @MartijnR has made a strong case for it and it does make sense at an ecosystem-level to have a way to interact with CSV external instances that is consistent with XML external instances.
I believe this addition only affects clients implementing this spec since the jr://file-csv
is already accepted and works in pyxforms. And looking through old issues suggests Kobo (@dorey) and Ona (@ukanga) already have some awareness of this approach and are on board.
Implementation-wise, XPath querying of external CSV instances won't be available in Collect immediately but that's ok. We can make sure that it is put on a roadmap eventually and clearly document what can and can't be queried through XPath. pulldata
does meet most users' needs and is simpler to use and that would continue to exist for now.
I think it would be terrific to move to a slightly more consistent process for approving these kinds of changes as we started discussing here. But this has been ongoing for a long time so I propose that we keep this conversation here and ask for a final sanity check from @clint-tseng, @dcbriccetti, @yanokwa and @dorey. Unless they see any show-stopping problems or think of someone else who should be involved, I think we can move ahead.
Thanks for the feedback. Yes, the file-csv
connector was to be consistent with audio, images, video and (generally) remove the type-detection burden on the client (even though it's no problem to do so).
Is this issue still unresolved? I'm exploring the Collect side of this feature, whose behavior seems to indicate that the corresponding XForms functionality is complete.
Thank you, @OpenDataNerd. I had asked for some final feedback but that was years ago and client implementations have moved forward. At this point it should just be written up for the spec.