tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Music Symbol Not Recognised

Open greentrancer opened this issue 4 years ago • 6 comments

Environment

  • Tesseract Version: 4.1.1
  • Platform: Windows 64-bit

Current Behavior:

Music symbol is detected as letter 'O'

SubtitleEdit_2020-04-30_02-51-35

Expected Behavior:

To recognise the symbol? It used to work before...

greentrancer avatar Apr 30 '20 01:04 greentrancer

I am also having the same problem as mentioned in another issue here: https://github.com/SubtitleEdit/subtitleedit/issues/6076

ScribbleGhost avatar Jul 18 '22 11:07 ScribbleGhost

It used to work before...

@greentrancer, which version worked? Which Tesseract models are used to recognize musical symbols?

@greentrancer, @ScribbleGhost, could you please add an example image and the tesseract command line which reproduces the issue?

stweil avatar Jul 18 '22 14:07 stweil

@stweil Are you saying that Tesseract should be able to recognize musical notes?

For me, neither Tesseract 3.02 nor Tesseract 5.1.0 has been able to recognize music notes. I am not familiar with the command line for Tesseract. I have only used SubtitleEdit. There is no way to display logging for Tesseract in SubtitleEdit, so I am not sure what is happening in the background.

Here is an example image: Image3

ScribbleGhost avatar Jul 18 '22 15:07 ScribbleGhost

No, I asked which version worked because I was surprised to hear that it "used to work before". Personally I don't know a Tesseract model which could recognize musical notes. Maybe SubtitleEdit has its own algorithm to detect them.

stweil avatar Jul 18 '22 15:07 stweil

That makes sense to me. I think I will stick to SubtitleEdit's nOCR as I have no idea how to train Tesseract and can't be bothered to be honest. nOCR takes a little while to train, but the interface is easy.

ScribbleGhost avatar Jul 22 '22 08:07 ScribbleGhost

Had similar problem with 5.1.0 working very poorly on music notes in subtitle edit. Switched "Engine Mode" to "Original Tesseract only (can detect italic)" setting and it's recognizing most everything now. Tons better.

alane avatar Aug 02 '22 02:08 alane