spaCy
spaCy copied to clipboard
Doc.init changes input to is_sent_start to List[int]
Hey,
I just stumbled accross the following behaviour, when creating a Doc manually. Is this intended behaviour?
How to reproduce the behaviour
import spacy
from spacy.tokens import Doc
nlp = spacy.blank("xx")
words_1 = ["This","is","the","first","sentence","."]
spaces_1 =[True,True,True,True,False,True]
sent_starts_1 = [True,False,False,False,False,False]
test_doc = Doc( vocab=nlp.vocab,words=words_1, spaces=spaces_1, sent_starts=sent_starts_1)
print(sent_starts_1)
#which prints [1, -1, -1, -1, -1, -1]
Your Environment
Info about spaCy
- spaCy version: 3.4.1
- Platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.29
- Python version: 3.8.10
Thanks for letting us know about this, I'll have a look at fixing it.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.