[ QUESTION ] How work summarize option?
How work summarize option?
Because now I see app only trim long text (from begging) or squeeze text (delete any empty space).
No mini AI model do that? Or in future will be mini AI model like mini 0.6 qwen or 1B llama or Gemini 3 running offline
@bi4key There are smaller AI models that can generate summaries, but their accuracy hasn’t been very reliable in testing. I haven’t had much time to research this further, as my current focus is on the core transcription features. I plan to explore summary capabilities in the future. So for now it's only trim long text & remove stop words
Ok, thx for your work!
Here some inspiration for future text recognition and summary:
https://github.com/docling-project/docling
https://huggingface.co/ibm-granite/granite-docling-258M
https://huggingface.co/unsloth/gemma-3-270m-it-qat-GGUF
Thanks for the links @bi4key
I would suggest that, once an offline runnable LLM model is found, the button allows to select various pre-recorded prompts that the user can define in the options, so that we can summarize, correct typos, reformat grammar and syntax, remove redundancies, make more formal, etc.
I am a bit behind in my monitoring of the current LLM landscape but I heard that there are incredible sub 1GB LLM models nowadays.
The linked models appear more to be useful to do data extraction from various documents, similarly to what docling does but without having to running the program stack (this can be useful in some situations where you either want more flexibility than a program provides or you can only run LLM models and cannot run the docling stack - so this is very niche imho). There exists more general mini/nano LLM models.
And here is nice mega thread:
https://www.reddit.com/r/LocalLLaMA/s/nBLTtkVPvK
https://huggingface.co/blog/ocr-open-models
@bi4key Very nice links, thank you very much!
However IMHO a text-only LLM would be sufficient and likely smaller for NotelyVoice, or do you think that OCR LLMs can provide additional value?