wdl
wdl copied to clipboard
Simple string "contains" checks
To make life easier for authors doing string checks like in https://gatkforums.broadinstitute.org/wdl/discussion/10354/multiple-backends-for-cromwell#latest
Eg:
File f
Boolean a = endsWith(basename(f), ".suffix")
Boolean b = startsWith(basename(f), "prefix.")
Boolean c = contains(basename(f), ".middle.")
Draft implementation: https://github.com/openwdl/wdl/commits/133-string-find/
@cjllanwarne would be interested to see if there are any other use cases for this. In my experience I really have not come across any workflows that would have benefitted from having this type of logic. This type of thing is really useful when it comes to running non-deterministic workflows which try to account for all possible scenarios, But in general I am not sure that applies to most WDL
s
@patmagee you ask to show usecase for very basic features are you serious or just making fun? Every second workflow I write needs contains or ends/startswith. The most obvious usecase is that you can get your sequencing files as read.fq, read.fastq, read.fastq.gz, read.fastq.bz2 and so on. Having different workflows just because input can differ between sources looks redundant, just simple string check on filename or extension can make one workflow support all of them and apply additional steps when gununziping or bziping are needed (not all bioinformatic tools handle them inside automatically)
@antonkulaga I would appreciate if you engaged in positive discussion in the issue forums, following our Code of Conduct
You have to remember this is a 4 year old ticket. What that means is first of all, WDL of 4 years ago was very different then it is today. Control flow in the language was minimal, or at least minimally defined and supported. Optional outputs were not even a thing either. So In that context I believe it makes sense to question the inclusion of new engine functions like these.
Secondly, being 4 years old, this tickets has yet to gain the support to transition into an RFC. If you are really excited about this change, I would recommend creating an RFC discussion over on the discussion board. You could even go so far as to support the RFC with a PR and potentially even provide a reference implementation in miniwdl
I think a find
function would be more generally useful and could be used to determine containment. Since we've already let the regex genie out of the bottle with sub
, I don't see a problem with find
taking a pattern argument.
String? find(String, String)
Searches for the pattern (second argument) in the string (first argument). Returns the first match, or None
if there is no match.
Contains would then be written
if (defined(find("fubar", "u.a"))) { ... }
We could also add a contains
method that would just be syntactic sugar for the above, although I'm not sure it makes sense to overload contains
, which is also the proposed name for the array containment function. Maybe matches
or contains_pattern
.
Would the pattern be allowed to be a regex? That would certainly be cool but it does open a potential can of worms where people would then naturally want to be able to inspect the match for capture groups which goes beyond a simple boolean response.