paraphrase-id-tensorflow icon indicating copy to clipboard operation
paraphrase-id-tensorflow copied to clipboard

Refactor out unnecessary processing in data pipeline

Open nelson-liu opened this issue 7 years ago • 0 comments

right now, the data pipeline will tokenize the input into both words / characters, even if you only want words. This is fine for now since character tokenization isn't that expensive, but it's not ideal for when we want to use NER/POS features, since running the taggers is can be quite slow and we don't want to do it unless necessary.

nelson-liu avatar May 15 '17 00:05 nelson-liu