saber icon indicating copy to clipboard operation
saber copied to clipboard

Switch token alignment to SpaCy

Open JohnGiorgi opened this issue 5 years ago • 0 comments

Currently, to align BERT tokens to original tokens (before BERT tokenization) we use some code I grabbed from the official BERT repo.

SpaCy has introduced functions specifically for aligning two tests tokenized with different tokenizers. Switch to this!

JohnGiorgi avatar Jul 17 '19 14:07 JohnGiorgi