ducttape
ducttape copied to clipboard
Switch-case statement for branching
The following is proposed syntax for switch-case statements in ducttape:
It allows for pattern matching on branch points that have already been previously defined by some upstream task.
switch switch_task_name on WhichThing < in=$out@prev_task > out {
# Handle a special case (e.g. segment Japanese)
case thing_one : juman {
echo "hello $in"
}
# Can handle multiple branches at once (e.g. Segment various Arabic dialects)
case thing_two, thing_three : ar_seg < ar_model=/path {
echo $hello $in
}
# Handle all other cases not previously mentioned (e.g. tokenize Western languages)
default : moses {
echo "$hello $in"
}
}
case/default blocks may not introduce additional outputs since each task must have a single, unique set of outputs.
A variant on Lane's proposal for multiple branch point matching:
switch task_name {
case (X: x1 x2) * (Y: y1) < in {
bash
}
}
Use case: How would we allow multiple Chinese segmenters iff we case match on the Chinese language?
What about this?
switch switch_task_name on WhichThing < in=$out@prev_task > out {
# Handle a special case (e.g. segment Japanese)
case thing_one : juman {
echo "hello $in"
}
# Can handle multiple branches at once (e.g. Segment various Arabic dialects)
case thing_two, thing_three switch arabic_parser on AR_parser < ar_model=/path {
case ar_seg {
echo $hello $in
}
case other_ar_seg {
echo $hello $in
}
}
# Handle all other cases not previously mentioned (e.g. tokenize Western languages)
default : moses {
echo "$hello $in"
}
}
Potential solution for having multiple Chinese segmenters:
switch switch_task_name on WhichThing < in=$out@prev_task > out {
# Handle a special case (e.g. segment Japanese)
case thing_one : juman {
echo "hello $in"
}
# Can handle multiple branches at once (e.g. Segment various Arabic dialects)
case thing_two, thing_three : ar_seg < ar_model=/path {
echo $hello $in
}
# Try multiple segmenters, but only for Chinese
case zh => branchpoint WhichSeg {
branch zh_seg : zhseg {
$zhseg
}
branch cool_seg : coolseg {
$coolseg
}
}
# Handle all other cases not previously mentioned (e.g. tokenize Western languages)
default : moses {
echo "$hello $in"
}
}
Lane had suggested combining switch-case (requires branch point to already be defined) with the "branchpoint" keyword (introduces a new branch point). We can still handle the use case of "try several segmenters if the language is Chinese" if we take that approach:
switch tokenize < in > out {
case (Lang: zh) * (Segmenter: stanford) : stanford_seg {
$stanford_seg < $in > $out
}
case (Lang: zh) * (Segmenter: berkeley) : berkeley_seg {
$berkeley_seg < $in > $out
}
default : moses {
$moses/tokenizer.pl < in > out
}
}
Optionally, we could allow a special character before the branch point name if we want to require the user to explicitly say when they want to add a new branch point instead of match an existing one:
switch tokenize < in > out {
case (Lang: zh) * (+Segmenter: stanford) : stanford_seg {
$stanford_seg < $in > $out
}
case (Lang: zh) * (+Segmenter: berkeley) : berkeley_seg {
$berkeley_seg < $in > $out
}
default : moses {
$moses/tokenizer.pl < in > out
}
}
This would involve changes to the AST parser and the WorkflowBuilder.