wdl icon indicating copy to clipboard operation
wdl copied to clipboard

read TSV with headers

Open antonkulaga opened this issue 6 years ago • 5 comments

I would like to be able to read TSV with headers, where each raw will become Map. So, I will be able to say raw["myheader"] instead of memorizing which number corresponds to which field in the workflow

Draft implementation: https://github.com/openwdl/wdl/tree/194-read-tsv

antonkulaga avatar Feb 20 '18 19:02 antonkulaga

If your TSV input could instead be json, you can use read_json to get objects with headers. Another advantage of doing that: when structs come in with draft-3 you can guarantee the correct fields are there as you read it into a struct.

cjllanwarne avatar Mar 08 '18 00:03 cjllanwarne

@cjllanwarne many biological databases/datasets are provided as tsv with headers, rewriting them to json is an extra work. I wonder why is it so hard just to add optional header boolean parameter to read_tsv?

antonkulaga avatar Mar 08 '18 01:03 antonkulaga

Oh I see. Theres no reason that this is a bad idea, I was just mentioning JSON in case it was useful in the meantime :)

cjllanwarne avatar Mar 08 '18 02:03 cjllanwarne

I used to do this with read_objects() and scatter(){}:

Array[Object] tsv = read_objects(file_tsv)
scatter (idx in tsv) {
  String col_myheader = tsv[idx]["myheader"]
}

This would extract the "myheader" column no matter where in the table that column is located.

Unfortunately read_objects() has been removed in version development and I am not sure how to read TSV with headers now. Ideally, I would be okay doing away with read_objects() as long as there was a function that, given a TSV with headers, returns an object Map[String,Array[String]] so that actually I would not even need to use the scatter(){} construct.

freeseek avatar Jun 17 '20 18:06 freeseek

In the end I have found this hack useful to read tables with headers:

Array[Array[String]] tsv = read_tsv(file_tsv)
scatter (idx in range(length(tsv))) { Array[String] tsv_rows = tsv[(idx+1)] }
Map[String, Array[String]] tbl = as_map(zip(tsv[0], transpose(tsv_rows)))

freeseek avatar Jul 28 '20 13:07 freeseek