fsearch icon indicating copy to clipboard operation
fsearch copied to clipboard

Add support to monitor folders for changes

Open bozicschucky opened this issue 8 years ago • 39 comments

This program for some reason doesn't index/cache the files in the directory and it has to update the database all the time in reboot. Can you give it some temp log file for it to index files in the database permanently some faeture that everything search in windows has.

Am using ubuntu 16.04

bozicschucky avatar Oct 19 '16 05:10 bozicschucky

I'm not sure if I understand you correctly, but FSearch does index all entries in a database. At launch this database is loaded again.

However, FSearch doesn't automatically detect changes made to the file system and update its index then. This is on the roadmap (it's called inotify support) but it'll never work as smooth as Everything on Windows, because the Linux kernel isn't particularly good at reporting filesystem changes.

cboxdoerfer avatar Oct 19 '16 06:10 cboxdoerfer

In the next build add the feature of the application being able to auto-update the database automatically so that it can add files automatically by itself.Because when i download even a small document, fsearch can't index it till i manually update the database myself.

But this is a good project. I think am gonna go back to study C and contribute. Regards bro.

bozicschucky avatar Oct 20 '16 05:10 bozicschucky

Yes, I know this is a really important feature to me too, but inotify support (the technology to automatically update the database) is planned for version 0.3, which will be released in a couple of months, depending on how long it takes me to release 0.2.

Edit: Of course help is always welcome :)

cboxdoerfer avatar Oct 20 '16 06:10 cboxdoerfer

i'm here on ubuntu 16.10 and there is a software that is called "gamin" and it says here in its description "File and directory monitoring system Gamin is a file and directory monitoring system which allows applications to detect when a file or a directory has been added, removed or modified by somebody else." i don't know if you can use this but i think it's promising and easy too as you shouldn't implement everything from scratch here. also i think there are other alternatives as well.

robert1826 avatar Oct 26 '16 18:10 robert1826

@robert1826, thx, I'll have a look at that. But first impression isn't that good, because Gamin seems to be pretty much dead - there have just been 5 commits in the past 8 years. However, chances are that's because Gamin is feature complete and rock solid. Only way to find out is by trying it.

cboxdoerfer avatar Oct 26 '16 18:10 cboxdoerfer

@cboxdoerfer ok, but the point is that the linux kernel actually supporting notifying applications about file system changes and that technology is called inotify maybe gamin is the best choice here but i'm sure that there are other alternatives that uses inotify. also i'll keep searching and will notify you if i found one

robert1826 avatar Oct 26 '16 19:10 robert1826

I know that the kernel is capable of file system notifications, I've used inotify extensively already. But inotify has lots of limitations, that's the problem. And most solutions (GFileMonitor, FS Event (libuv), ...) are just a nicer frontend for inotify.

Like I said in my other post: The Linux kernel is really the limiting factor here, and there's currently nothing that I can do to bring the smooth and fast experience Everything offers on Windows. inotify is just much slower, it requires much more memory, it's less reliable and harder to use. If you are interested you can read about some of that in the inotify documentation: http://man7.org/linux/man-pages/man7/inotify.7.html#NOTES

But slow and memory hungry notifications are better than none, so of course I'm going to add that one way or another. Just don't expect FSearch being able to monitor the whole file system (/), because that's going to be really slow - and most certainly the kernel wont even allow it since it reaches the limit of available inotify watches per user.

cboxdoerfer avatar Oct 26 '16 19:10 cboxdoerfer

@cboxdoerfer A bit off topic, but how about an accelerator/shortcut to update database? Like F5 or Ctrl+R? Maybe also add a message in the background where it says "Press Ctrl+F and start typing" like: "Press Ctrl+F and start typing or Ctrl+R to update database"

spsf64 avatar Nov 13 '16 16:11 spsf64

@spsf64, yes, that's a good idea. Since Ctrl+R is already used (enable regex mode), I'll probably use Ctrl+Shift+R instead.

In the future I'm adding the ability to configure shortcuts for all actions anyway, then users can choose whatever key combinations they happen to like.

cboxdoerfer avatar Nov 13 '16 16:11 cboxdoerfer

@spsf64, done. https://github.com/cboxdoerfer/fsearch/commit/f2f7a7c1b1dbfa61c97f4ff27f839bbf510f3118

cboxdoerfer avatar Nov 13 '16 17:11 cboxdoerfer

@cboxdoerfer Wow, this one was fast! Just built the new package (using archlinux / aur) and it works perfect. Thank you!

spsf64 avatar Nov 13 '16 17:11 spsf64

@spsf64, no problem ;)

cboxdoerfer avatar Nov 13 '16 17:11 cboxdoerfer

Hi @cboxdoerfer i was wondering about a way to implement an incremental database update at least for now till someone can figure out how to make a 'proper' folder monitor ... the idea is we crawl the folders keeping a time that the previous database was built and we compare the time of the current database with the modification time of the target folder if we found that its older that the our database time we skip that folder else we recursively crawl that directory our resume with whatever way you are doing .... hope this idea helps or at least inspire someone else to help thx again for this awesome piece of software

robert1826 avatar Dec 27 '16 15:12 robert1826

isn't it possible to even build a script just to update the database that i can run regularly via cron (like the one with angrysearch) ?

robert1826 avatar Feb 08 '17 19:02 robert1826

I think the idea of the script to use with cron is very cool, indeed!

I am currently using it in angrysearch, so the database is automatically updated every 6 hours. I think it may be a good compromise

kupiqu avatar Mar 17 '17 02:03 kupiqu

I've also came across fswatch with allows the recursive monitoring for directories .. just wanted to let you know

robert1826 avatar Mar 23 '17 20:03 robert1826

dear author, please give yourself time to make inotify your number 1 priority for this project. without inotify, this app is totally useless as I already can query the mlocate via cli and mlocate is already up to date via automated cron jobs. thanks.

mailinglists35 avatar Dec 15 '18 19:12 mailinglists35

Yeah this really doesn't work like Everything if you have to spend several minutes updating the database before each search. :/

endolith avatar Mar 10 '19 16:03 endolith

The Linux kernel is really the limiting factor here, and there's currently nothing that I can do to bring the smooth and fast experience Everything offers on Windows. inotify is just much slower, it requires much more memory, it's less reliable and harder to use.

Not sure if this would help, but have you seen these recent changes to fanotify? https://github.com/torvalds/linux/commit/235328d1fa4251c6dcb32351219bb553a58838d2

shao113 avatar Apr 09 '19 23:04 shao113

The Linux kernel is really the limiting factor here, and there's currently nothing that I can do to bring the smooth and fast experience Everything offers on Windows. inotify is just much slower, it requires much more memory, it's less reliable and harder to use.

Not sure if this would help, but have you seen these recent changes to fanotify? torvalds/linux@235328d

this looks promising! And now @cboxdoerfer is tagged too :P

phil294 avatar Apr 22 '19 21:04 phil294

Would eBPF for monitoring/tracing file system changes be worth considering? Example projects that seem to use it to monitor file system changes:

danielkrajnik avatar Oct 16 '22 12:10 danielkrajnik

@danielkrajnik, thanks I've not heard of that before. I'll have a look at it.

cboxdoerfer avatar Oct 17 '22 09:10 cboxdoerfer

Thanks, I hope that it could be faster than fanotify and substitute what USN Journal provides on NTFS. Here is another interesting project from this area: https://github.com/kanurag94/filemonitor

danielkrajnik avatar Oct 17 '22 09:10 danielkrajnik

@cboxdoerfer Have you looked into the eBPF capabilities yet? This does sound promising.

dlong500 avatar Jan 23 '23 19:01 dlong500

@dlong500 yes, I experimented a bit with it. It's incredibly powerful and flexible, but it's also more complex to implement and at least in my demo had a performance overhead compared to fanotify and inotify (but this might be fixable).

So for the next 0.3 release I decided to use fanotify as the default backend (which works really well in my testing) and inotify as a fallback. An eBPF backend, if it turns out to be an improvement compared to the others, can then be added later. This way I'm not unnecessarily delaying the release of 0.3 any further.

cboxdoerfer avatar Jan 23 '23 21:01 cboxdoerfer

Here's a short update for the progress of adding monitoring support:

This might not sound or look like much, but it was quite some work to get here. So here's the first video demonstrating how FSearch updates the search results as files are being removed with the terminal:

Screencast from 2023-02-28 18-42-59.webm

In order for that to work the database was rewritten completely in the last couple of weeks and now I'm step by step porting the code from the monitor prototypes to FSearch.

I hope to get the first alpha versions with full monitoring support out by the end of next month. However, note that those will likely not be usable as a daily driver, e.g. some features might still be missing or the database format on disk will likely change a couple of times.

cboxdoerfer avatar Feb 28 '23 18:02 cboxdoerfer

I'm currently adding the file move/rename handling and ran into the following question: What's supposed to happen with the selection when a file gets renamed? Should the file (with the new name) keep the selection state it previously had or should it automatically become un-selected?

cboxdoerfer avatar Mar 01 '23 18:03 cboxdoerfer

Thanks for asking, I'd keep the previous selection (common operation for me would be renaming a file and then copying/moving it to somewhere else).

danielkrajnik avatar Mar 01 '23 18:03 danielkrajnik

Thanks, I'll keep the selection then for now. If this turns out to be controversial I can still add a config options for it.

cboxdoerfer avatar Mar 01 '23 18:03 cboxdoerfer

So it turned out that remembering file selection for moved/renamed files is a bit more difficult than anticipated and I've put it on hold for the moment.

The problem is that it is quite difficult to detect true move or rename events with inotify. The general idea of inotify is that whenever you rename or move a file inotify creates two events for you: IN_MOVED_FROM and IN_MOVED_TO.

The first minor problem is that there can be other events in between those two. The fix for that is quite simple: remember all IN_MOVED_FROM events until their matching IN_MOVED_TO event happens.

But the big problem is that inotify doesn't always create proper pairs. Sometimes you only get a IN_MOVED_FROM and never the corresponding IN_MOVED_TO event and vice versa. This happens when files move between un-watched and watched directories. E.g. when you're monitoring /home/user/Downloads and you move one of its files to the un-monitored trash directory, then you'll only get a IN_MOVED_FROM event and never a IN_MOVED_TO event.

There are two ways how this can be fixed, as far as I know:

  1. Assume that if there's still no matching IN_MOVED_TO event after some time, that there won't ever be one and we then interpret the former IN_MOVED_FROM event as a moved out of our monitored directory event and simply remove the file from our index. The longer you wait, the more reliable this approach gets, but also your index and search results remain inconsistent with the file system for longer. There's probably some good middle ground for that, but it remains guess work.

  2. The most reliable fix I can think of is to simply treat every IN_MOVED_FROM event immediately as a delete event and an IN_MOVED_TO event as a created event. This just works, since there's no guess work necessary for how long to wait for the next event, but it comes at the cost of being a bit more resource intensive and it's not possible to remember the selection for actually moved/renamed files.

So currently I'm favoring and using the second approach, simply because it's reliable and simplifies the code. But I'll revisit the first approach again. If anyone knows of an alternative solution, let me know.

cboxdoerfer avatar Mar 14 '23 18:03 cboxdoerfer