dstlr icon indicating copy to clipboard operation
dstlr copied to clipboard

Initial explorations w/ PySpark and spaCy

Open lintool opened this issue 6 years ago • 0 comments

from @r-clancy

The ecosystem of Python-based NLP tools is much greater than what's available on the JVM - we want to look into changing DSTLR to be written in Python using PySpark and having our own extractors (e.g., BERT based NER, entity linking, relation extraction, etc.).

Let's start some initial exploration into this and see how it works? One of the biggest issues is that BERT is slow, but what about spaCy? It recently added entity linking in version 2.2 (unreleased) and has NER already. Can we look into training/adding a relational extractor?

https://github.com/explosion/spaCy

lintool avatar Sep 27 '19 09:09 lintool