spaCy
spaCy copied to clipboard
Add checks (and converters?) for documents with multiple sentences in debug-data
Feature description
The parser section of spacy debug-data should show a warning when there are no/few documents with multiple sentences in the training data.
Potentially add a simple converter to spacy convert to group sentences, similar to -n with the IOB converters. A bit of variety in document lengths is probably a good idea here, too, rather than just -n N, but I don't know if it makes that much difference in the model performance.
#4467 adds the debug-data warning.