BallonsTranslator
BallonsTranslator copied to clipboard
GOOGLE OCR: current situation
07.04.25 UPD: A temporary fixed.
30.03.25 UPD: A temporary working version has been made. Unfortunately, we cannot reduce the processing speed below 10s. If you are a python developer, please help me, because there is a new fix that works faster, but I do not have enough experience and time to transfer it to BT. https://github.com/dimdenGD/chrome-lens-ocr/issues/29#issuecomment-2763105553
27.03.25 UPD: It fell. Maybe some, maybe all. As soon as I find a solution, I'll update it.
30.01.25 UPD: I know that some people will not be able to use Lens OCR. Fixed https://github.com/dmMaze/BallonsTranslator/pull/757
I @bropines wrote a plugin for Google ocr.
Lately it may not work, producing “text not found” or 303 errors. In case of an error, the text was not found, try going to the google.com page and passing it an image to the search area. If you are moved to this (screenshot), then the problem is in the EU region. Use proxies of CIS countries or any other except EU.
ScreenShot
If your page is significantly different from my example, then there was an update in your region and Google changed the Endpoints. This happens 3 times. I will find a solution if suddenly the update becomes final.
- If you have a 303 error, UPDATE PROGRAMM. How? Look this https://github.com/dmMaze/BallonsTranslator/issues/685#issuecomment-2580171436
I'm writing a plugin that connects to Google's paid API, and unfortunately, in many regions it's not even possible to register with it. The plugin will be for those who have somehow gained access and a key from google vision.
If you have a 303 error, increase delay to 1.5.
With delay 30!! I still have sometimes 303 error...
If you have a 303 error, increase delay to 1.5.
With 30!! I still have sometimes 303 error...
Here is my library. Make a bunch of images (40 pieces) and run the folder through this library. If everything goes well with it, I will add updated COOKIES. A 303 error literally means that Google is sending you somewhere. And there are two reasons
- Google is moving you to a new interface due to an update.
- You live in the EU and Google forcibly asks you for “COOKIES”.
You can check with this command:
lens_scan <folder> full_text_default --debug=debug
I just want to say what is simple way to solve this problem 303 - rerun this OCRed picture again (and again if you got 303 in second time). I try to do it manually and it works.
I just want to say what is simple way to solve this problem 303 - rerun this OCRed picture again (and again if you got 303 in second time). I try to do it manually and it works.
Do the test as I described above. If it goes smoothly, I'll rewrite the plugin to make it work correctly. I don't yet understand what causes this error.
I just want to say what is simple way to solve this problem 303 - rerun this OCRed picture again (and again if you got 303 in second time). I try to do it manually and it works.
Do the test as I described above. If it goes smoothly, I'll rewrite the plugin to make it work correctly. I don't yet understand what causes this error.
Sorry, but it's too late in my country (KZ). Tomorrow I'm going to test it.
I just want to say what is simple way to solve this problem 303 - rerun this OCRed picture again (and again if you got 303 in second time). I try to do it manually and it works.
Do the test as I described above. If it goes smoothly, I'll rewrite the plugin to make it work correctly. I don't yet understand what causes this error.
Sorry, but it's too late in my country (KZ). Tomorrow I'm going to test it.
We could have spoken Russian
We could have spoken Russian
На 30 тестовых страницах из 30 комиксов вывел результаты для всех 30 страниц, но насколько он там все или не все распознал, или часть баллонов пропустил - я сверять не могу, у меня нет столько времени. Но по крайней мере все 30 страниц есть результат в виде текста log: https://pastebin.com/aZ7xMbmm
Offtop:
- Во первых. Я тебя ща ушатаю за то что ты лог загрузил так. Лучше грузи такое длинное на pastebin или хотябы в код блок(перезалей, а то листать не удобно, мыж тут не одни в теме). На крайняк загрузи сюда https://gist.github.com/ и замени на ссылку. А как файл сделай
debug.bash - Во вторых, я примерно понял почему. Завтра после универа сделаю фикс. Посмотрим. Если гугл не учудит что-то снова.
- Завтра после универа сделаю фикс.
Offtop. 2 вопроса.
- У меня гугл OCR часто (1-5 раз на страницу) распознает часть текста другим языком. То есть если страница например на английском, то иногда в предложении одно-два слова распознается другим языком, например на русском или на греческом. Естественно это рушит весь смысл и приходится руками править. Например OCR выдает : On va se changer...Ca c'est pour τοι... И тут последнее слово - это вообще греческие буквы, а должно быть toi. И тако происходит постоянно, прям среди предложения бумс и левое слово. По написанию то оно похоже, но с другого языка, другими символами. Нельзя ли при распознавании явно задать язык перевода, чтобы не было таких багов? Дружищще, подскажи, мне часто нужен такой функционал - при распознавании очень часто теряются диакритические знаки у символов. Все эти á È ê Ç и так далее, зависит от исходного языка. Проблема в том, что без этих знаков часто становится совершенно другой смысл, и получается ты много времени проводишь в правке исходного текста, заменяя в некоторых случаях а на á, Е на È и так далее. И мне очень не хватает на правой кнопке мыши в поле текстового ввода минитабличку быстрого ввода определенных символов. То есть смысл такой - на правой кнопке есть новый пункт - добавить символ. В нем заранее определенные пара десятков символов. Появляется необходимость ввести какой-то нестандартный символ - нажал правую кнопку, выбрал, символ появился в тексте на позиции курсора. Надеюсь понятно объяснил. Возможно, тебе будет интересно добавить такой функционал?
Нельзя ли при распознавании явно задать язык перевода, чтобы не было таких багов?
-
Нет нельзя. Поясняю точнее. Текст детектится и распознается гуглом. У интерфейса GOOGLE lens тупо нет варианта выбора языка. Я как решение могу сделать заглушки. Что имеется в виду. Ты выбираешь бабл, ЯВНО указываешь язык в параметрах плагина. В этот момент мой плагин автоматически с каждой картинкой посылает заглушку гдето сверху над блоком с текстом на нужном языке. Да, костыль. Я могу попробовать это сделать. Так же, есть теория что язык выбирается от страны запроса, но я не понимаю как встроить в куки такое.
-
Второе решение
áможно добавить в
В теории тоже поможет.
Eng:
Is it possible to explicitly set the translation language during recognition so that there are no such bugs?
-
No, you can't. I'm explaining it more precisely. The text is detected and recognized by Google. The GOOGLE lens interface stupidly has no language selection option. As a solution, I can make stubs. What is meant by that. You select the bubble, EXPLICITLY specify the language in the plugin parameters. At this point, my plugin automatically sends a stub with each picture somewhere above the block with the text in the desired language. Yes, a crutch. I can try to do it. Also, there is a theory that the language is selected from the country of the request, but I do not understand how to embed this in cookies.
-
The second solution
ácan be added to
In theory, it will also help.
Оба решения не помогут. Потому что гугл то распознает язык бабла нормально. Но вот какое-то одно слово (или даже несколько символов) заменяет. Вряд ли заглушка поможет. Автозамена тоже не вариант, так как иногда заменяется одна буква а иногда друга, не говоря уже что у исходных букв могут быть диакритика... В общем ладно, лучше руками править, так хоть какой-то контроль.
Оба решения не помогут. Потому что гугл то распознает язык бабла нормально. Но вот какое-то одно слово (или даже несколько символов) заменяет. Вряд ли заглушка поможет. Автозамена тоже не вариант, так как иногда заменяется одна буква а иногда друга, не говоря уже что у исходных букв могут быть диакритика... В общем ладно, лучше руками править, так хоть какой-то контроль.
Кинь ка мне в тг страницы с аномалиями. Я хочу посмотреть че там гугл засылает. А еще кинь скрин страницы с аномалией. Так же в тг. @bropines
Оба решения не помогут. Потому что гугл то распознает язык бабла нормально. Но вот какое-то одно слово (или даже несколько символов) заменяет. Вряд ли заглушка поможет. Автозамена тоже не вариант, так как иногда заменяется одна буква а иногда друга, не говоря уже что у исходных букв могут быть диакритика... В общем ладно, лучше руками править, так хоть какой-то контроль.
303 ошибку починил(теоретически). Как DmMaze проснется, он зальет изменение. Ща сижу потею над фиксом "неверного" языка
I fixed the 303 error (theoretically). As DmMaze wakes up, he will flood the change. Right now I’m sitting here sweating over a fix for the “wrong” language
Problem 303 is apparently solved, it is no longer present on the tests. The problem with partial recognition of another language remains, if you still need - I can give you test pages, but I have this problem on almost any page of French comics (I mainly use translation from French to English), it's strange that no one else reported this problem. Do you still need me to send test pages to telegram?
Problem 303 is apparently solved, it is no longer present on the tests. The problem with partial recognition of another language remains, if you still need - I can give you test pages, but I have this problem on almost any page of French comics (I mainly use translation from French to English), it's strange that no one else reported this problem. Do you still need me to send test pages to telegram?
Yes. Let's. I'll see what's there
22.12.24
I'm developing a plugin that connects to Google's paid API. Unfortunately, in many regions, it’s not even possible to register for it. The plugin is intended for those who have somehow managed to get access and an API key for Google Vision.
Based on my tests, it seems the paid version of Cloud Vision is significantly worse compared to what Google Lens provides. My assumption is that the paid version is trained on much more specialized data, such as documents. While it handles English fairly well, it struggles with other languages. For example:
- In Japanese, it often ignores the
ーcharacter. - In Chinese, it occasionally skips characters altogether.
- Korean, however, is recognized reliably.
I initially thought the issues might be caused by specific data included in the API requests. However, the problem remains unresolved because the official documentation at Google Cloud Vision API Reference has been down for several days. This has prevented me from experimenting further, although my current implementation follows Google’s official guidelines.
When it comes to detecting text and text blocks (such as speech bubbles), Google Vision performs on par with CTD, and in some minor cases, even exceeds it. My plan is to integrate Google Vision as a text detection option, but only experimentally. The primary issue is the cost: both detection and recognition are charged under the same pricing model. This means the free monthly limit of 1,000 units will be exhausted quickly. Unfortunately, I’m not yet sure if the plugin can be designed to handle detection and recognition in a single request while outputting the identified text blocks. @dmMaze – is this possible, or would this require modifications to the plugin?
If your page is significantly different from my example, there may have been a regional update where Google changed the endpoints. This has happened three times before. I’ll look for a solution if the update becomes permanent.
From what I’ve seen on forums, Google seems to have rolled back the recent changes—possibly due to user complaints or because they broke something. After the update, Google browsers suddenly stopped utilizing Lens features entirely. I don’t know how long the current method will remain functional, but I’ll provide updates here as soon as something changes.
Anyone who has encountered this problem (https://github.com/dmMaze/BallonsTranslator/issues/712). UPGRADE TO THE LATEST VERSION OF BT.
- For google drive users from dmmaze, if everything didn't update automatically at startup, open the console in the program folder and write
.\PortableGit\bin\git.exe pull
- For users https://github.com/bropines/Ballon-translator-portable You don't need to do that. In theory, it always checks for updates.
"I used the "Google Docs" OCR method before you integrated the "Google Lens" approach last year. My workflow was to upload the speech bubbles to Google Drive, open them with Google Docs, and then I'd get selectable text. I used a Python script from someone who originally made it for converting soft subtitles into hard subtitles, but it worked perfectly for automating this process. I've been using this API for over two years without any issues."
https://tanaikech.github.io/2017/05/02/ocr-using-google-drive-api/
-Can you automate the above process? and it will use my own google drive account. -I usually merge the bubbles into 1 page in order from top to bottom and upload them, that is faster than uploading each bubble. -Sometimes I will delete the folder containing that speech bubble if it gets too full and create a new one folder.
-Can you automate the above process? and it will use my own google drive account. -I usually merge the bubbles into 1 page in order from top to bottom and upload them, that is faster than uploading each bubble. -Sometimes I will delete the folder containing that speech bubble if it gets too full and create a new one folder.
I'll see if I can
-Can you automate the above process? and it will use my own google drive account. -I usually merge the bubbles into 1 page in order from top to bottom and upload them, that is faster than uploading each bubble. -Sometimes I will delete the folder containing that speech bubble if it gets too full and create a new one folder. python.txt
I'll see if I can
sounds great. Here is a demo of how I use that python code. I often create subtitles with it. A long movie with over 2000 lines of hard subtitles takes just a few minutes to create soft subtitles.
https://github.com/user-attachments/assets/15d2ac1c-96e6-4013-b6dc-dce09847a107
That subtitle-generating software also allows me to merge multiple images into one large image — up to 50 lines (or a custom number). This helps reduce resource usage when uploading to Google Drive
https://github.com/user-attachments/assets/c3b518c6-0f9a-4692-b703-34dd7264373f