TagUI
TagUI copied to clipboard
希望增加 OCR 识别中文 - seems possible, need to explore more and refine steps
你好,我查看了之前写的RPA项目,有些项目需要多次使用OCR识别中文,特别是 click 文字 using ocr
的时候。
read 识别中文结果是下面这样:
中文识别上,我经常用百度的AI,有大量免费使用量,付费识别也非常非常精准,这个以后可以考虑封装到 TagUI 里。
TagUI 用的是 OpenCV ,不知道 OpenCV 是否支持中文识别。
Hi @kangyiwen I don't think OCR supports Chinese directly. For read step you can read the XPath to a variable and echo Chinese characters. To display Chinese characters in command prompt you can do run chcp 65001
in the command prompt.
This is the syntax for read step:
read XPath to a
echo `a`
谢谢,chcp 65001
非常好用。
用到OCR的地方,是图片的识别,并不在Chrome里。例如 PDF 里面不是文字,而是图片,就需要用到OCR。
TagUI uses OpenCV through SikuliX engine. There is definitely options for Tesseract to support Chinese OCR.
I think the step is to download from a specific Tesseract GitHub repo the Chinese definition, and put in the specific folder for OCR for Chinese. But I have not tried that myself to validate if that works. Ruth, could you do a check how that can be done in the current SikuliX used by TagUI? The Tesseract version in that SikuliX is v3.05.
PS 1 - Tesseract is the OCR engine for doing OCR related actions. OpenCV is the computer vision engine to "see" the screen and do things like image matching and search.
PS 2 - cross-linking to a user query on the Python version of TagUI - https://github.com/tebelorg/RPA-Python/issues/336
According to SikuliX forum, to add the chinese library you can download from tesseract and place in the AppData\Roaming\Sikulix\SikulixTesseract\tessdata folder.
The tesseract data files for 3.05 can be found here: (chi_sim.traineddata) https://github.com/tesseract-ocr/tessdoc/blob/main/tess3/Data-Files.md
I followed the steps here and it was able to recognise the chinese text on the screen. https://answers.launchpad.net/sikuli/+faq/2709
@ruthtxh 我把 chi_sim.traineddata 放到 \Sikulix\SikulixTesseract\tessdata 后,下面这些命令要在哪里执行???
我尝试在 TagUI 和 CMD 里,都不行。
import org.sikuli.script.TextRecognizer as TR
Settings.OcrReadText = True
Settings.OcrLanguage = "chi_sim"
TR.reset()
Hi @kangyiwen
For SikuliX script you can write in vision block
vision begin
import org.sikuli.script.TextRecognizer as TR
Settings.OcrReadText = True
Settings.OcrLanguage = "chi_sim"
TR.reset()
vision finish
https://tagui.readthedocs.io/en/latest/reference.html?highlight=vision#vision
谢谢,明白怎么运行 SikuliX script 了,之前也的确没理解 vision
是干嘛用的。
同时新问题又出现了:识别出的依然是乱码。
run cmd /c chcp 65001
//运行Sikuli代码把OCR设置成中文
vision begin
import org.sikuli.script.TextRecognizer as TR
Settings.OcrReadText = True
Settings.OcrLanguage = "chi_sim"
TR.reset()
vision finish
http://baidu.com
read (2426,529)-(2785,638) to txt
echo `txt`
dump `txt` to 000222.txt
我运行了2次,一次包含 vision ,一次不包含 vision ,发现保存的txt结果是不一样的,说明 OCR设置中文是有效的(见下图),但输入到TXT时的问题,我猜是编码出了问题。
@ruthtxh 请问你是什么编码下运行成功的? chcp 要怎么设置?
谢谢
I don't have a Windows PC to test, maybe the issue is with the line you have in the .tag file
run cmd /c chcp 65001
run cmd will run a new command line process to do the chcp command. But it might not affect the actual cmd process that you are using to run TagUI. Maybe doing chcp 65001 in the command prompt before you run tagui can help.
Other than this, probably will need to compare Ruth and your PC environment to see why there is different result.
我把 run cmd /c chcp 65001
先运行,然后再开始 .tag ,结果是一样的。
Hi @kangyiwen as mentioned on the call, I briefly tested with both SikuliX code (vision step) and TagUI code (read step). Only the SikuliX Code is able to give proper chinese output.
http://baidu.com
vision begin
import org.sikuli.script.TextRecognizer as TR
Settings.OcrReadText = True
Settings.OcrLanguage = "chi_sim"
TR.reset()
ocr_text = Region(960, 460, 350, 60).text()
tagui_text = open('tagui.sikuli/tagui_sikuli.txt','w')
tagui_text.write(ocr_text); tagui_text.close()
vision finish
vision_result = fetch_sikuli_text()
echo `vision_result`
Their region parameters are Region(x, y, w, h). http://doc.sikuli.org/region.html
Adding on @ruthtxh this is an interesting observation. Maybe more clues can be found by investigating the logs in tagui\src\tagui.sikuli folder. By right, TagUI read step would be doing similar sequence of SikuliX commands. From the information here, I cannot tell why would TagUI read step fail while doing it fully in vision code block works.
In addition, you may want to consider if the steps to do OCR for foreign languages can be added to the usage guide (perhaps under advanced concepts section). But only if the instructions can be done in a general way, so that users of other languages can follow along and enable OCR for their own languages.
@ruthtxh 谢谢,这个代码可以运行了。
多次尝试后,发现这个中文识别的错误率非常高,等我再熟悉些,会尝试封装一个用百度AI的OCR识别。
That's a good idea. For using Baidu API, one way is to use the api step to consume the REST API service directly - https://tagui.readthedocs.io/en/latest/reference.html#api
The second way is to consume it through Python, if the service has a Python package or implementation - https://tagui.readthedocs.io/en/latest/advanced.html#writing-python-within-flows
Lastly, the SikuliX that TagUI comes with is an older version that works better across Windows, Mac, Linux. There is an updated version of SikuliX that uses the next generation Tesseract version for OCR, that is based on AI/ML. That I believe will definitely be better than the current TagUI OCR, but I have not yet explored the steps how to enable that on the latest SikuliX.
Fyi @ruthtxh, if you could, to put a note to explore this. Because at some point in time, it will make sense to upgrade SikuliX in TagUI. An immediate downside is Mac users will have to install some dependencies manually. There may be more downsides, so I have not recommend an upgrade so far.
建议先独立尝试下 SikuliX 新版本 OCR英文、中文效果怎么样,如果改进明显,那再更新 TagUI 内的 SikuliX。
想了下,OCR还是需要做到 TagUI底层里,这样才能 using ocr
,例如:
click 百度一下 using ocr
hover 报名 using ocr
单独做个百度API,无法实现这个功能。
Thanks for your thoughts, indeed, to implement as part of TagUI steps, it will requires upgrading the SikuliX. Consuming Baidu's API would not be doable directly from the click and hover steps. Fyi @ruthtxh
等你应用完,我们现在没空弄这个。。。。哈哈哈。。
在 2022-04-29 08:40:56,"kangyiwen" @.***> 写道:
@ruthtxh 谢谢,这个代码可以运行了。
多次尝试后,发现这个中文识别的错误率非常高,等我再熟悉些,会尝试封装一个用百度AI的OCR识别。
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Both the logs for vision step and read step output success, will have to dig more to find more clues. Also added the upgrading SikuliX to the list of suggestions. Thanks @kensoh