wdl
wdl copied to clipboard
read TSV with headers
I would like to be able to read TSV with headers, where each raw will become Map. So, I will be able to say raw["myheader"] instead of memorizing which number corresponds to which field in the workflow
Draft implementation: https://github.com/openwdl/wdl/tree/194-read-tsv
If your TSV input could instead be json
, you can use read_json
to get objects with headers.
Another advantage of doing that: when struct
s come in with draft-3 you can guarantee the correct fields are there as you read it into a struct.
@cjllanwarne many biological databases/datasets are provided as tsv with headers, rewriting them to json is an extra work. I wonder why is it so hard just to add optional header boolean parameter to read_tsv?
Oh I see. Theres no reason that this is a bad idea, I was just mentioning JSON in case it was useful in the meantime :)
I used to do this with read_objects()
and scatter(){}
:
Array[Object] tsv = read_objects(file_tsv)
scatter (idx in tsv) {
String col_myheader = tsv[idx]["myheader"]
}
This would extract the "myheader" column no matter where in the table that column is located.
Unfortunately read_objects()
has been removed in version development and I am not sure how to read TSV with headers now. Ideally, I would be okay doing away with read_objects()
as long as there was a function that, given a TSV with headers, returns an object Map[String,Array[String]]
so that actually I would not even need to use the scatter(){}
construct.
In the end I have found this hack useful to read tables with headers:
Array[Array[String]] tsv = read_tsv(file_tsv)
scatter (idx in range(length(tsv))) { Array[String] tsv_rows = tsv[(idx+1)] }
Map[String, Array[String]] tbl = as_map(zip(tsv[0], transpose(tsv_rows)))