semantra
semantra copied to clipboard
Watch folder
Nice job, app works nicely! Some questions & suggestions, maybe i missed some answers in the docs:
It would be wonderful to have a "watch folder", where documents can be added and then are automatically added to next search results. It would be useful to have embeddings for the documents permanently stored, especially those from openai, to avoid duplicate charges. It would be nice to have models, document embeddings in user-specified directory.
Thanks for the feedback! I like the watch folder idea, which relates to some other ideas around being able to add documents to be processed from the interface.
Re: your following points, Semantra mostly has features that cover this already. The document embeddings are permanently stored, even as the file is processing in case it's interrupted. You won't be charged twice with OpenAI.
You can see which directory they're stored in by running semantra --show-semantra-dir
. You can change this to a directory of your choosing by running semantra --semantra-dir <path>
(but keep in mind that unless you port over the cached files from the current Semantra processing directory, it will then potentially re-embed documents you've given it in the past).
As far as where models end up, that's controlled by https://github.com/huggingface/transformers. You can customize this with the TRANSFORMERS_CACHE
environment variable https://stackoverflow.com/a/63314437
Thanks for your great job, Semantra is extremely helpful for me! @freedmand
I have a few suggestions on PDF annotations:
- Allow to adjust the color of search results as my default highlighting color is yellow. The current color is confused for me to distinguish visually.
- I found that if one more annotation was added, the embeddings seem to be be processed again. Could it be disabled as the actual content of the PDF doesn't change at all.
- It would be great to add a filter to refine the search results exclusively to PDF annotations.
As someone who would like to use Semantra with their Zotero library the Watch folder and the Annotation issue would be indeed useful.
As someone who would like to use Semantra with their Zotero library the Watch folder and the Annotation issue would be indeed useful.
For me, I do hope Semantra’s features can be integrated into DEVONthink.
Interesting angle on annotations. I hadn't thought about that as much, but it may be possible to have a pre-processing option to strip annotations off and only compute embeddings again if the file is unchanged.