semantra icon indicating copy to clipboard operation
semantra copied to clipboard

Watch folder

Open mirozahorak opened this issue 1 year ago • 5 comments

Nice job, app works nicely! Some questions & suggestions, maybe i missed some answers in the docs:

It would be wonderful to have a "watch folder", where documents can be added and then are automatically added to next search results. It would be useful to have embeddings for the documents permanently stored, especially those from openai, to avoid duplicate charges. It would be nice to have models, document embeddings in user-specified directory.

mirozahorak avatar Apr 25 '23 16:04 mirozahorak

Thanks for the feedback! I like the watch folder idea, which relates to some other ideas around being able to add documents to be processed from the interface.

Re: your following points, Semantra mostly has features that cover this already. The document embeddings are permanently stored, even as the file is processing in case it's interrupted. You won't be charged twice with OpenAI.

You can see which directory they're stored in by running semantra --show-semantra-dir. You can change this to a directory of your choosing by running semantra --semantra-dir <path> (but keep in mind that unless you port over the cached files from the current Semantra processing directory, it will then potentially re-embed documents you've given it in the past).

As far as where models end up, that's controlled by https://github.com/huggingface/transformers. You can customize this with the TRANSFORMERS_CACHE environment variable https://stackoverflow.com/a/63314437

freedmand avatar Apr 25 '23 18:04 freedmand

Thanks for your great job, Semantra is extremely helpful for me! @freedmand

I have a few suggestions on PDF annotations:

  • Allow to adjust the color of search results as my default highlighting color is yellow. The current color is confused for me to distinguish visually.
  • I found that if one more annotation was added, the embeddings seem to be be processed again. Could it be disabled as the actual content of the PDF doesn't change at all.
  • It would be great to add a filter to refine the search results exclusively to PDF annotations.

TomBener avatar Apr 27 '23 13:04 TomBener

As someone who would like to use Semantra with their Zotero library the Watch folder and the Annotation issue would be indeed useful.

andreifoldes avatar May 16 '23 15:05 andreifoldes

As someone who would like to use Semantra with their Zotero library the Watch folder and the Annotation issue would be indeed useful.

For me, I do hope Semantra’s features can be integrated into DEVONthink.

TomBener avatar May 17 '23 00:05 TomBener

Interesting angle on annotations. I hadn't thought about that as much, but it may be possible to have a pre-processing option to strip annotations off and only compute embeddings again if the file is unchanged.

freedmand avatar May 17 '23 13:05 freedmand