pySBD icon indicating copy to clipboard operation
pySBD copied to clipboard

Example not working with Spacy version 3.1 and 3.0.6

Open Atul997 opened this issue 2 years ago • 3 comments

I have installed current spacy version 3. 1 and running the example with some modifications but it keeps throwing error of ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe('sentencizer'). Alternatively, add the dependency parser or sentence recognizer, or set sentence boundaries by setting doc[i].is_sent_start.

Below is the code that I am using- import spacy import pysbd from spacy.language import Language

@Language.component("sbd") def pysbd_sentence_boundaries(doc): seg = pysbd.Segmenter(language="en", clean=False, char_span=True) sents_char_spans = seg.segment(doc.text) char_spans = [doc.char_span(sent_span.start, sent_span.end, alignment_mode='contract') for sent_span in sents_char_spans] start_token_ids = [span[0].idx for span in char_spans if span is not None] for token in doc: token.is_sent_start = True if token.idx in start_token_ids else False return doc

if name == "main": text = "My name is Jonas E. Smith. Please turn to p.55." nlp = spacy.blank('en') doc = nlp(text) # add as a spacy pipeline nlp.add_pipe('sbd') print('sent_id', 'sentence', sep='\t|\t') for sent_id, sent in enumerate(doc.sents, start=1): print(sent_id, sent.text, sep='\t|\t')

Atul997 avatar Sep 16 '21 06:09 Atul997

Facing the same issue

gserapio avatar Jun 28 '22 13:06 gserapio

@gserapio - You might find this example from medspacy helpful https://github.com/medspacy/medspacy/blob/master/medspacy/sentence_splitting.py

alexhamiltonRN avatar Jul 03 '22 18:07 alexhamiltonRN

@gserapio - You might find this example from medspacy helpful https://github.com/medspacy/medspacy/blob/master/medspacy/sentence_splitting.py

Thanks @alexhamiltonRN !

gserapio avatar Jul 04 '22 15:07 gserapio