nlp-discussion icon indicating copy to clipboard operation
nlp-discussion copied to clipboard

Existing work: Dependency parsing

Open danieldk opened this issue 6 years ago • 2 comments

danieldk avatar May 08 '19 08:05 danieldk

I have developed two dependency parsers:

  • dpar is a transition-based dependency parser that uses a Chen & Manning-like feed forward neural network. It's robust (we used it to annotate ~29 billion tokens), however requires some work to train. Also needs a lot more documentation.

  • sticker is a sequence labeler that using bidirectional RNNs or dilated convolution networks. I have recently added support for dependency parsing following the dependency parsing as sequence labeling scheme of Strzyz et al., 2019. In my experiments with German, it gives an improvement of ~2% in LAS score compared to dpar (currently 94.41%). Current plans:

    • Investigating whether the tch crate can replace Tensorflow. This would have the benefit that we can directly create the neural networks in Rust and makes the current excursion to Python to create the Tensorflow graph unnecessary.
    • Generalizing over the input format. sticker currently uses CoNLL-X throughout. This will allow supporting other formats, such as CoNLL-U.
    • Tobias Pütz has planned to add support for transformers.
    • More documentation.
    • Providing pretrained models, once the pace of development slows down a bit.

danieldk avatar May 08 '19 10:05 danieldk

  • sticker updates: sticker supports transformers, pretrained models are available for German/Dutch. Has switched to maintenance mode.
  • sticker2: we started sticker2 as a successor to sticker:
    • Uses libtorch through the tch crate.
    • Supports finetuning of BERT and XLM-RoBERTa pretrained models through the sticker-transformers crate.
    • Supports model distillation.
    • Supports lemmatization in addition to general sequence labeling and dependency parsing.

danieldk avatar Feb 05 '20 13:02 danieldk