scispacy
scispacy copied to clipboard
any guidance on ontonotes prep?
Since ontonotes requires direct licensing from source are there any pointers or scripts to prep for how to convert the corpus format over to the expected train / dev / test splits so that ud_ontonotes.tar.gz
can be properly replicated locally?
I unfortunately don't have the exact details to provide you, @DeNeutoy might remember some more details, but I believe it was the same splits/processing spacy uses which appear to be referenced in a couple places (https://github.com/explosion/spaCy/issues/5276, https://github.com/explosion/spaCy/issues/3587#issuecomment-483191672).