AttributeError with NlpEngine
Describe the bug
When running certain strings through the new TransformersNlpEngine I am getting errors. It doesn't happen on every string.
To Reproduce Steps to reproduce the behavior:
- See provided colab : https://colab.research.google.com/drive/1H_kKeHlfvZUSaPN0HNRn1_ymZk_TH4Cp#scrollTo=J63RtBZFaiLv
Expected behavior I would expect it not to error
Additional context i hacked this line with the following : https://github.com/microsoft/presidio/blob/37f74e8e880cb1bdf3f5224a05eaa9b63df02d31/presidio-analyzer/presidio_analyzer/nlp_engine/transformers_nlp_engine.py#L59
if span is not None:
span._.confidence_score = d["score"]
ents.append(span)
and it silences the issue, but I don't understand why. help is appreciated.
Good catch. From spaCy's docs, char_span Returns None if the character indices don’t map to a valid span using the default alignment mode strict. We'll look into this. Perhaps changing the alignment_mode is the preferred solution.
thanks for the response @omri374 . I tried expand and contract in alignment_mode and am seeing exceptions thrown in both cases. I do feel expand is the way to go, but might need some tweaking. I can try to do some more digging later this week.
Fixed in #941
Note that there still could be issues with the alignment, and #941 is not a perfect solution. If you have issues with this, we would recommend to use the TransformersRecognizer instead of the TransformersNlpEngine