spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

Doc.init changes input to is_sent_start to List[int]

Open stefawolf opened this issue 2 years ago • 1 comments

Hey,

I just stumbled accross the following behaviour, when creating a Doc manually. Is this intended behaviour?

How to reproduce the behaviour

import spacy
from spacy.tokens import Doc
nlp = spacy.blank("xx")
words_1 = ["This","is","the","first","sentence","."]
spaces_1 =[True,True,True,True,False,True]
sent_starts_1 = [True,False,False,False,False,False]
test_doc = Doc( vocab=nlp.vocab,words=words_1, spaces=spaces_1, sent_starts=sent_starts_1)
print(sent_starts_1)
#which prints [1, -1, -1, -1, -1, -1]

Your Environment

Info about spaCy

  • spaCy version: 3.4.1
  • Platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.29
  • Python version: 3.8.10

stefawolf avatar Sep 14 '22 13:09 stefawolf

Thanks for letting us know about this, I'll have a look at fixing it.

richardpaulhudson avatar Sep 14 '22 15:09 richardpaulhudson

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

github-actions[bot] avatar Oct 27 '22 00:10 github-actions[bot]