wdl icon indicating copy to clipboard operation
wdl copied to clipboard

Simple string "contains" checks

Open cjllanwarne opened this issue 6 years ago • 5 comments

To make life easier for authors doing string checks like in https://gatkforums.broadinstitute.org/wdl/discussion/10354/multiple-backends-for-cromwell#latest

Eg:

File f
Boolean a = endsWith(basename(f), ".suffix")
Boolean b = startsWith(basename(f), "prefix.")
Boolean c = contains(basename(f), ".middle.")

Draft implementation: https://github.com/openwdl/wdl/commits/133-string-find/

cjllanwarne avatar Sep 18 '17 19:09 cjllanwarne

@cjllanwarne would be interested to see if there are any other use cases for this. In my experience I really have not come across any workflows that would have benefitted from having this type of logic. This type of thing is really useful when it comes to running non-deterministic workflows which try to account for all possible scenarios, But in general I am not sure that applies to most WDLs

patmagee avatar Mar 09 '18 14:03 patmagee

@patmagee you ask to show usecase for very basic features are you serious or just making fun? Every second workflow I write needs contains or ends/startswith. The most obvious usecase is that you can get your sequencing files as read.fq, read.fastq, read.fastq.gz, read.fastq.bz2 and so on. Having different workflows just because input can differ between sources looks redundant, just simple string check on filename or extension can make one workflow support all of them and apply additional steps when gununziping or bziping are needed (not all bioinformatic tools handle them inside automatically)

antonkulaga avatar Jul 28 '21 10:07 antonkulaga

@antonkulaga I would appreciate if you engaged in positive discussion in the issue forums, following our Code of Conduct

You have to remember this is a 4 year old ticket. What that means is first of all, WDL of 4 years ago was very different then it is today. Control flow in the language was minimal, or at least minimally defined and supported. Optional outputs were not even a thing either. So In that context I believe it makes sense to question the inclusion of new engine functions like these.

Secondly, being 4 years old, this tickets has yet to gain the support to transition into an RFC. If you are really excited about this change, I would recommend creating an RFC discussion over on the discussion board. You could even go so far as to support the RFC with a PR and potentially even provide a reference implementation in miniwdl

patmagee avatar Jul 28 '21 12:07 patmagee

I think a find function would be more generally useful and could be used to determine containment. Since we've already let the regex genie out of the bottle with sub, I don't see a problem with find taking a pattern argument.

String? find(String, String)

Searches for the pattern (second argument) in the string (first argument). Returns the first match, or None if there is no match.

Contains would then be written

if (defined(find("fubar", "u.a"))) { ... }

We could also add a contains method that would just be syntactic sugar for the above, although I'm not sure it makes sense to overload contains, which is also the proposed name for the array containment function. Maybe matches or contains_pattern.

jdidion avatar Mar 23 '23 15:03 jdidion

Would the pattern be allowed to be a regex? That would certainly be cool but it does open a potential can of worms where people would then naturally want to be able to inspect the match for capture groups which goes beyond a simple boolean response.

markjschreiber avatar Jan 24 '24 15:01 markjschreiber