litbank
litbank copied to clipboard
Original vs annotated alignment
Hello! Thank you for making this really cool dataset publicly available :)
I'm trying to align the annotations and the original text, could you please specify what tokenizer was used to produce the dataset? So far I can't get it quite right. Or is there perhaps an easier way to align original texts and annotations that I'm missing? Thanks in advance