pot-desktop [Feature]: Is it possible to add an OCR service based on LLM?

Description

Currently, OCR accuracy based on LLMs (such as Gemini pro 2.5, Mistral AI OCR) is far superior to traditional OCR models based on deep learning technology. Can you add interfaces for multiple major LLM OCRs?

Application Scenario

The accuracy of LLM OCR is higher.

References

No response

May 11 '25 03:05 BaseBlank

there are some plugin can server your purpose https://github.com/pot-app/pot-app-plugin-list/blob/main/README.md#%E6%A8%A1%E6%9D%BF At least I tried the qwen-VL plugin.

May 26 '25 13:05 kiron111

there are some plugin can server your purpose https://github.com/pot-app/pot-app-plugin-list/blob/main/README.md#%E6%A8%A1%E6%9D%BF At least I tried the qwen-VL plugin.

Thank you for your guidance. I’d prefer to use Mistral OCR. And this Qwen plugin doesn't connect via an API; instead, it uses browser cookies. I'm not very fond of this non-standard approach.

May 26 '25 13:05 BaseBlank