Combine "spacy evaluate" with "bluesearch.mining.eval"
Currently our evaluation step in the dvc pipeline dedicated to NER models relies entirely on the following script, which calls functions from bluesearch.mining.eval.
https://github.com/BlueBrain/Search/blob/fa1331c98c8823ec85c5b3d92d58e99ab6010574/data_and_models/pipelines/ner/eval.py#L1
It could be convenient to use instead the interface provided by spaCy's CLI, i.e. spacy evaluate. In particular, we would like that to potentially call our own evaluation functions (e.g. token-based and entity-based F1, Prec, Rec, ...).
- [ ] does
spacy evaluateproduce the same results (entity-based F1, Prec, Rec) than ourbluesearch.mining.evalfunctions? if not, why? - [ ] can we configure
spacy evaluatein such a way that thespacy.Scorercalls our own user defined scoring functions? - [ ] if we then don't need the annotations in JSONL format anymore, then (i) get rid of the
preprocessingstep and script (ii) use.spacybinaries instead of.jsonl. - [ ] Simplify
params.yamlby removing the use ofetype_name(see here for details).
UPDATE Actions
The section for evaluation in params.yaml could be simplified after #351.
There is indeed no more the need for mapping scispaCy labels to our labels.