spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

Add checks (and converters?) for documents with multiple sentences in debug-data

Open adrianeboyd opened this issue 6 years ago • 1 comments

Feature description

The parser section of spacy debug-data should show a warning when there are no/few documents with multiple sentences in the training data.

Potentially add a simple converter to spacy convert to group sentences, similar to -n with the IOB converters. A bit of variety in document lengths is probably a good idea here, too, rather than just -n N, but I don't know if it makes that much difference in the model performance.

adrianeboyd avatar Oct 09 '19 09:10 adrianeboyd

#4467 adds the debug-data warning.

adrianeboyd avatar Oct 19 '19 17:10 adrianeboyd