Combine "spacy evaluate" with "bluesearch.mining.eval"

Open FrancescoCasalegno opened this issue 4 years ago • 1 comments

Currently our evaluation step in the dvc pipeline dedicated to NER models relies entirely on the following script, which calls functions from bluesearch.mining.eval. https://github.com/BlueBrain/Search/blob/fa1331c98c8823ec85c5b3d92d58e99ab6010574/data_and_models/pipelines/ner/eval.py#L1

It could be convenient to use instead the interface provided by spaCy's CLI, i.e. spacy evaluate. In particular, we would like that to potentially call our own evaluation functions (e.g. token-based and entity-based F1, Prec, Rec, ...).

[ ] does spacy evaluate produce the same results (entity-based F1, Prec, Rec) than our bluesearch.mining.eval functions? if not, why?
[ ] can we configure spacy evaluate in such a way that the spacy.Scorer calls our own user defined scoring functions?
[ ] if we then don't need the annotations in JSONL format anymore, then (i) get rid of the preprocessing step and script (ii) use .spacy binaries instead of .jsonl.
[ ] Simplify params.yaml by removing the use of etype_name (see here for details).

Mar 17 '21 13:03 FrancescoCasalegno

UPDATE Actions

The section for evaluation in params.yaml could be simplified after #351.

There is indeed no more the need for mapping scispaCy labels to our labels.

May 04 '21 09:05 pafonta