jidoujisho icon indicating copy to clipboard operation
jidoujisho copied to clipboard

[Feature suggestion] Consider using manga-ocr

Open WilsonNet opened this issue 2 years ago • 4 comments

Nice work on jidoujisho, really useful. Recently I've been experimenting with a new OCR, called manga-ocr, and in my experience, it has more than 99% precision rate. It's really out of this world.

Example (the page selected was totally random):

https://user-images.githubusercontent.com/25280488/153520966-be319a80-228a-4a7e-a783-807635771abd.mp4

WilsonNet avatar Feb 11 '22 00:02 WilsonNet

I'm looking into this, will have to hook it into Chaquopy if I ever make use of it since it uses Python.

lrorpilla avatar Feb 13 '22 11:02 lrorpilla

Hi, manga-ocr author here. I'm glad to see the interest in this project, and would love to see it integrated with jidoujisho!

One alternative to Chaquopy would be to export model to onnx format and run on Android using onnxruntime.

https://huggingface.co/docs/transformers/serialization https://onnxruntime.ai/docs/tutorials/mobile/

The hard part here is the onnx export, I think it should be possible, but might be tricky. I was meaning to try it myself at some point, but I'm not sure when I will find time for that. There's also some Python logic which would need to be ported, but it's rather lightweight so it shouldn't be a problem.

Just wanted to mention this, I don't actually know if it's a better option than Chaquopy (probably depends on how well Chaquopy can deal with Pytorch and Huggingface dependencies).

kha-white avatar Feb 14 '22 20:02 kha-white

Hey, @kha-white. I was thinking of contacting you about this by e-mail, then I remembered that you had already reached out some time ago.

I cut the Viewer from my 2.0 release because I wanted to try integrating your work and using it on Android but I had trouble running it with Chaquopy (there are some issues in their repo related to fugashi and pyclippers (this was the one that errored out for me ultimately).

To be honest I'm not at all literate or have ever gotten hands on with machine learning software (though I'd like to use this opportunity to do so). If you can spare time, I'd like to work together to make this a possibility.

lrorpilla avatar Sep 07 '22 09:09 lrorpilla

Would really love to see this in the app!

Would it be hard to implement the screen cut and paste feature like in manga-ocr, supposing that ocr api is working already?

PainterHalver avatar Sep 14 '22 15:09 PainterHalver

The hard part here is the onnx export

Export of VisionEncoderDecoder models to ONNX has now been merged to huggingface.

https://github.com/huggingface/transformers/pull/19254

I ran the following command and the files were produced without error:

python -m transformers.onnx --model=models--kha-white--manga-ocr-base/ --feature=vision2seq-lm onnx/ --atol 1e-3

I haven't tried doing inference yet, but it seems feasible.

mathewthe2 avatar Dec 04 '22 02:12 mathewthe2

Yeah I've tried that too, export seems fine, but inference is not as straightforward since you need to do a beam search or something similar. Definitely doable, but requires some work.

kha-white avatar Dec 04 '22 12:12 kha-white

I was wondering, would this or this be of any use? I have a lot of spare time at the moment to study up on this, but ML isn't my field at all though, I could use some guidance on what I might need to get this to work.

lrorpilla avatar Jan 09 '23 17:01 lrorpilla

Closing as per #75.

lrorpilla avatar Apr 02 '23 11:04 lrorpilla