Same Analyzer detects entity in text but not in image
Describe the bug same Analyzer detects LOCATION entity token in text but fails to detect the same token in an image
To Reproduce
analyzer=AnalyzerEngine(nlp_engine=nlp_engine_with_french,
log_decision_process="true",supported_languages = ["fr","en"])
print(analyzer.analyze(text='VALENCE', language ="en"))
ImageAnalyzer = ImageAnalyzerEngine(analyzer_engine = analyzer)
engine = ImageRedactorEngine(image_analyzer_engine = ImageAnalyzer)
Expected behavior VALENCE is detected as location, even if I change the language, the text, lower-case etc... it is detected as LOCATION. If I use the same Analyzer to create an ImageAnalyzer, VALENCE should be detected as LOCATION if the word is there in the image.
Could it be the the OCR engine doesn't recognizer this text? Have you tried running tesseract on it to see the output?
since I have the log_decision_process to "true", the word "VALENCE" is there in the log
Edit: That being said, I created an empty image with just the word "VALENCE" on it, and it was detected as LOCATION. Does the detection depends on the words before and after ??
Yes, location is detected using a named entity recognition model. context words could certainly change the output. If you have a finite list of locations, you can create a deny list and pass it to the analyzer engine.
Yes, location is detected using a named entity recognition model. context words could certainly change the output. If you have a finite list of locations, you can create a deny list and pass it to the analyzer engine.
Thanks for your reply. I actually have that already in my code but I still don't detect this one location. I think the problem is deeper, something about that exact document makes it problematic. In my deny list though I have the locations first letter Capital, in the document it is written all in CAPITAL letters, not sure if this is a problem.