ducttape icon indicating copy to clipboard operation
ducttape copied to clipboard

Switch-case statement for branching

Open jhclark opened this issue 12 years ago • 7 comments

The following is proposed syntax for switch-case statements in ducttape:

It allows for pattern matching on branch points that have already been previously defined by some upstream task.

switch switch_task_name on WhichThing < in=$out@prev_task > out {
  # Handle a special case (e.g. segment Japanese)
  case thing_one : juman {
    echo "hello $in"
  }
  # Can handle multiple branches at once (e.g. Segment various Arabic dialects)
  case thing_two, thing_three : ar_seg < ar_model=/path {
    echo $hello $in
  }
  # Handle all other cases not previously mentioned (e.g. tokenize Western languages)
  default : moses {
    echo "$hello $in"
  }
}

jhclark avatar May 08 '12 14:05 jhclark

case/default blocks may not introduce additional outputs since each task must have a single, unique set of outputs.

jhclark avatar May 08 '12 14:05 jhclark

A variant on Lane's proposal for multiple branch point matching:

switch task_name {
  case (X: x1 x2) * (Y: y1) < in {
    bash
  }
}

jhclark avatar May 08 '12 15:05 jhclark

Use case: How would we allow multiple Chinese segmenters iff we case match on the Chinese language?

jhclark avatar May 08 '12 15:05 jhclark

What about this?

switch switch_task_name on WhichThing < in=$out@prev_task > out {
  # Handle a special case (e.g. segment Japanese)
  case thing_one : juman {
    echo "hello $in"
  }
  # Can handle multiple branches at once (e.g. Segment various Arabic dialects)
  case thing_two, thing_three switch arabic_parser on AR_parser < ar_model=/path {
        case ar_seg {
            echo $hello $in
        }
        case other_ar_seg {
            echo $hello $in
        }
  }
  # Handle all other cases not previously mentioned (e.g. tokenize Western languages)
  default : moses {
    echo "$hello $in"
  }
}

dowobeha avatar May 08 '12 15:05 dowobeha

Potential solution for having multiple Chinese segmenters:

switch switch_task_name on WhichThing < in=$out@prev_task > out {
  # Handle a special case (e.g. segment Japanese)
  case thing_one : juman {
    echo "hello $in"
  }
  # Can handle multiple branches at once (e.g. Segment various Arabic dialects)
  case thing_two, thing_three : ar_seg < ar_model=/path {
    echo $hello $in
  }
  # Try multiple segmenters, but only for Chinese
  case zh => branchpoint WhichSeg {
    branch zh_seg : zhseg {
      $zhseg
    }
    branch cool_seg : coolseg {
      $coolseg
    }
  }
  # Handle all other cases not previously mentioned (e.g. tokenize Western languages)
  default : moses {
    echo "$hello $in"
  }
}

jhclark avatar May 08 '12 15:05 jhclark

Lane had suggested combining switch-case (requires branch point to already be defined) with the "branchpoint" keyword (introduces a new branch point). We can still handle the use case of "try several segmenters if the language is Chinese" if we take that approach:

switch tokenize < in > out {
  case (Lang: zh) * (Segmenter: stanford) : stanford_seg {
    $stanford_seg < $in > $out
  }
  case (Lang: zh) * (Segmenter: berkeley) : berkeley_seg {
    $berkeley_seg < $in > $out
  }
  default : moses {
    $moses/tokenizer.pl < in > out
  }
}

Optionally, we could allow a special character before the branch point name if we want to require the user to explicitly say when they want to add a new branch point instead of match an existing one:

switch tokenize < in > out {
  case (Lang: zh) * (+Segmenter: stanford) : stanford_seg {
    $stanford_seg < $in > $out
  }
  case (Lang: zh) * (+Segmenter: berkeley) : berkeley_seg {
    $berkeley_seg < $in > $out
  }
  default : moses {
    $moses/tokenizer.pl < in > out
  }
}

jhclark avatar Jul 11 '12 19:07 jhclark

This would involve changes to the AST parser and the WorkflowBuilder.

jhclark avatar Jan 03 '13 22:01 jhclark