wdl icon indicating copy to clipboard operation
wdl copied to clipboard

engine function to map over an Array

Open ruchim opened this issue 6 years ago • 8 comments

There's currently one main function (prefix) that maps over all the elements in an array. When one wants to do some other kind of mapping/manipulation over all the elements in an array, they can do so by using a scatter block:

Array[String] arr = ['a', 'b', 'c']

scatter (i in arr) {
 String append_suffix = i + '_1'
}

Array[String] arr_with_suffix = append_suffix # ['a_1', 'b_1', 'c_1']

It would be great if there was an engine function that allowed for running functions over an Array type without requiring the boilerplate of a scatter block.

ruchim avatar Mar 13 '18 18:03 ruchim

Just a thought, what would you think of a list comprehension expression, eg something like the python syntax? (just throwing out some ideas here):

# 1:
Array[String] arr_with_suffix = [ a in arr: a + "_1" ]
# 2:
Array[String] arr_with_suffix = [ arr: a => a + "_1" ]
# 3:
Array[String] arr_with_suffix = [ map arr: a => a + "_1" ]
# 4:
Array[String] arr_with_suffix = [ a + "_1" for a in arr ]

cjllanwarne avatar Mar 13 '18 21:03 cjllanwarne

A list comprehension sounds exactly like whats required, and I could work with any of that syntax.

Do you imagine something like this would be allowed? Array[String] arr_with_suffix = [ a + "_1" for a in arr if (true) ]

ruchim avatar Mar 13 '18 22:03 ruchim

@ruchim if we go down this route, I don't see why not!

cjllanwarne avatar Mar 15 '18 18:03 cjllanwarne

Another possible suggestion is to follow the path of vectorized languages like R or MATLAB (which in my experience are the two most common languages used by bioinformatics researches), and simply have functions be Array-aware. That is, if a scalar function that takes scalar input is run on an array, a new array is returned with each value being the result of that scalar function being run on the corresponding value of the input array (implicit map).

dheiman avatar Sep 04 '18 18:09 dheiman

I raised an issue which ended up being a duplicate but wanted to quote @jtratner answer from here:

scatter(x in arr) {
   String names = basename(x)
}
call mytask (input: names=names}

and then names outside of scatter are an array of strings.

illusional avatar Jan 29 '20 23:01 illusional

Super convenient for array of maps of file with their indexes, i.e. array of bams and their bai.

ghost avatar Feb 07 '21 22:02 ghost

Hey @patmagee, is there any way to gain some momentum on this feature request? Is list comprehension an acceptable approach? I'm happy to target something against a spec and implement it in MiniWDL maybe?

illusional avatar May 07 '21 05:05 illusional

I like the addition but I'd opt for the less burdensome (on the user) approach suggested by @dheiman where if you pass an array of strings to something like basename or a +, you'd get the same array back, in the same order, but with the operation performed. That way folks who aren't python users don't need to learn what list comprehension is.

vortexing avatar Jan 03 '22 18:01 vortexing