pot-desktop icon indicating copy to clipboard operation
pot-desktop copied to clipboard

[Feature]: Is it possible to add an OCR service based on LLM?

Open BaseBlank opened this issue 7 months ago • 2 comments

Description

Currently, OCR accuracy based on LLMs (such as Gemini pro 2.5, Mistral AI OCR) is far superior to traditional OCR models based on deep learning technology. Can you add interfaces for multiple major LLM OCRs?

Application Scenario

The accuracy of LLM OCR is higher.

References

No response

BaseBlank avatar May 11 '25 03:05 BaseBlank

there are some plugin can server your purpose https://github.com/pot-app/pot-app-plugin-list/blob/main/README.md#%E6%A8%A1%E6%9D%BF At least I tried the qwen-VL plugin.

kiron111 avatar May 26 '25 13:05 kiron111

there are some plugin can server your purpose https://github.com/pot-app/pot-app-plugin-list/blob/main/README.md#%E6%A8%A1%E6%9D%BF At least I tried the qwen-VL plugin.

Thank you for your guidance. I’d prefer to use Mistral OCR. And this Qwen plugin doesn't connect via an API; instead, it uses browser cookies. I'm not very fond of this non-standard approach.

BaseBlank avatar May 26 '25 13:05 BaseBlank