jidoujisho
jidoujisho copied to clipboard
[Feature suggestion] Consider using manga-ocr
Nice work on jidoujisho, really useful. Recently I've been experimenting with a new OCR, called manga-ocr, and in my experience, it has more than 99% precision rate. It's really out of this world.
Example (the page selected was totally random):
https://user-images.githubusercontent.com/25280488/153520966-be319a80-228a-4a7e-a783-807635771abd.mp4
I'm looking into this, will have to hook it into Chaquopy if I ever make use of it since it uses Python.
Hi, manga-ocr author here. I'm glad to see the interest in this project, and would love to see it integrated with jidoujisho!
One alternative to Chaquopy would be to export model to onnx format and run on Android using onnxruntime.
https://huggingface.co/docs/transformers/serialization https://onnxruntime.ai/docs/tutorials/mobile/
The hard part here is the onnx export, I think it should be possible, but might be tricky. I was meaning to try it myself at some point, but I'm not sure when I will find time for that. There's also some Python logic which would need to be ported, but it's rather lightweight so it shouldn't be a problem.
Just wanted to mention this, I don't actually know if it's a better option than Chaquopy (probably depends on how well Chaquopy can deal with Pytorch and Huggingface dependencies).
Hey, @kha-white. I was thinking of contacting you about this by e-mail, then I remembered that you had already reached out some time ago.
I cut the Viewer from my 2.0 release because I wanted to try integrating your work and using it on Android but I had trouble running it with Chaquopy (there are some issues in their repo related to fugashi and pyclippers (this was the one that errored out for me ultimately).
To be honest I'm not at all literate or have ever gotten hands on with machine learning software (though I'd like to use this opportunity to do so). If you can spare time, I'd like to work together to make this a possibility.
Would really love to see this in the app!
Would it be hard to implement the screen cut and paste feature like in manga-ocr, supposing that ocr api is working already?
The hard part here is the onnx export
Export of VisionEncoderDecoder models to ONNX has now been merged to huggingface.
https://github.com/huggingface/transformers/pull/19254
I ran the following command and the files were produced without error:
python -m transformers.onnx --model=models--kha-white--manga-ocr-base/ --feature=vision2seq-lm onnx/ --atol 1e-3
I haven't tried doing inference yet, but it seems feasible.
Yeah I've tried that too, export seems fine, but inference is not as straightforward since you need to do a beam search or something similar. Definitely doable, but requires some work.
I was wondering, would this or this be of any use? I have a lot of spare time at the moment to study up on this, but ML isn't my field at all though, I could use some guidance on what I might need to get this to work.
Closing as per #75.