promnesia icon indicating copy to clipboard operation
promnesia copied to clipboard

real time indexing

Open karlicoss opened this issue 6 years ago • 6 comments

E.g. something inotify based. That would make the implementation quite a bit more complext that it's at the moment. Also due to the nature of many exports (periodic), it won't be realtime unless the underlying exports are realtime. Still it could at least detect source files changes, etc. Also would work well in conjunction with Grasp.

karlicoss avatar Dec 27 '19 22:12 karlicoss

Might need to be careful about closing libmagic https://github.com/karlicoss/promnesia/pull/124#issuecomment-653890755

karlicoss avatar Jul 05 '20 13:07 karlicoss

Relevant: i've implemented 'almost realtime' indexing recently: https://github.com/karlicoss/promnesia/blob/c442081c3ac46859462ccd10f65dc89e39a2f44d/src/promnesia/dump.py#L18

E.g. you can have a separate config file only with your text notes (which should be indexed very fast). Then if you run

PROMNESIA_INDEX_POLICY=update promnesia index --config /path/to/small/config, it will merge it into the main database.

That means you can run it very often (e.g. every five minutes), or potentially combine with entr to achieve 'realtime' indexing..

karlicoss avatar Nov 10 '20 05:11 karlicoss

The last comment here needs to make it into main docs.

Even better, if a new option is added like promnesia index --update so that the above preserves existing items in server's database:

promnesia index --update --config <small-config> --secrets <secret-file>

But what about de-duplication? Are there any issues with updates?

ankostis avatar Feb 12 '21 10:02 ankostis

Yep, good idea to pass it in cmdline args! It was somewhat experimental at first, so I made it an env variable, but it seems to work pretty well (apart from one minor race condition I might need to fix first). Maybe even it makes sense to make --update mode the default? I guess the worst that would happen is some stale entries would be in the database -- then if the user notices them, they can do a full reindex manually.

Regarding deduplication -- not sure what do you mean? This is how it works at the moment https://github.com/karlicoss/promnesia/blob/e3b21cb080fa9965802bfd2e931ef4263e3a34e9/src/promnesia/dump.py#L61-L79

So it clears all the entries corresponding to the data source first and then inserts them. Hopefully shouldn't result in duplication!

karlicoss avatar Feb 12 '21 18:02 karlicoss

hmm seems that it was closed automatically by github -- we don't really have realtime indexing yet, so I'll reopen

karlicoss avatar Jan 25 '23 22:01 karlicoss

Perhaps for actual 'realtime' this would need proper HPI support. E.g. HPI module exposes a generator or something, which Promnesia can poll on (presumably, in a loop over all promnesia sources). Not sure how easy it'll be to make it asynchronous enough though, and also going to be tricky to 'expire' stale Visits, but could work well for incremental/synthetic sources (which typically are the most expensive computationally)

karlicoss avatar Jan 31 '23 03:01 karlicoss