scispacy icon indicating copy to clipboard operation
scispacy copied to clipboard

any guidance on ontonotes prep?

Open mprorock opened this issue 3 years ago • 1 comments

Since ontonotes requires direct licensing from source are there any pointers or scripts to prep for how to convert the corpus format over to the expected train / dev / test splits so that ud_ontonotes.tar.gz can be properly replicated locally?

mprorock avatar Feb 24 '22 21:02 mprorock

I unfortunately don't have the exact details to provide you, @DeNeutoy might remember some more details, but I believe it was the same splits/processing spacy uses which appear to be referenced in a couple places (https://github.com/explosion/spaCy/issues/5276, https://github.com/explosion/spaCy/issues/3587#issuecomment-483191672).

dakinggg avatar Mar 01 '22 00:03 dakinggg