intellij-community icon indicating copy to clipboard operation
intellij-community copied to clipboard

[fsNotifier] IDEA-126491 handle hard links

Open mdelah opened this issue 2 years ago • 18 comments

As mentioned on IDEA-126491, fsnotifier has a problem with hard links. On watching a hard link to an already watched file:

  • inotify_add_watch returns the existing watch ID (as documented)
  • fsnotifier uses this ID as a key in a hash table
  • There's a collision
  • The watcher fails and the IDE eventually turns it off

This can be an annoyance with certain build tooling that relies on hard links. And this is my attempt to fix it, by storing a linked list of watch_nodes in each hash bucket.

mdelah avatar Sep 28 '22 04:09 mdelah

@trespasserw when will you able to review this ?

Elentary avatar Nov 04 '22 16:11 Elentary

+1 Can you prioritize reviewing this pr? @trespasserw This is really impacting many people's productivity.

AlinJiang avatar Nov 04 '22 16:11 AlinJiang

Blocking me, too. I can't reproduce it but it happened when somehow referring to dependencies in an optimized nix store, which uses hard-links.

pschyska avatar Dec 20 '22 10:12 pschyska

This is an issue that happens every day, would be nice if it could be merged

Klemms avatar Dec 21 '22 10:12 Klemms

@ybedrov Could you assign another reviewer ?

Elentary avatar Dec 21 '22 13:12 Elentary

Soon. Sorry.

trespasserw avatar Dec 21 '22 13:12 trespasserw

Hoping this gets merged in soon after the devs enjoy the holidays and fixes the issue, this is a pain for remote development.

ADataGMan avatar Dec 23 '22 03:12 ADataGMan

To give more context for future reviewing: there was an alternate fix mentioned on the support ticket that basically amounted to discarding the additional filename if the underlying file description was already watched. This PR differs from that one in the following scenario:

  1. We have files foo.txt and bar.txt that happen to be hard linked to the same file
  2. fsnotifier is running
  3. The underlying file is modified (that is to say, both foo.txt and bar.txt now have different content than before)
  4. Depending on the patch: a. (linked patch) fsnotifier will generate an event for foo.txt OR bar.txt (we don't know which one, presumably it's random) b. (this PR) fsnotifier will generate an event for foo.txt AND bar.txt

I don't know for sure which behaviour is the right one, because I don't know how the IDE processes the event exactly, but the latter feels right. I can say that I've been running this patch for a couple months personally and it's solved the issue for me and I haven't noticed any other negative effects.

Happy holidays!

mdelah avatar Dec 23 '22 06:12 mdelah

I got the idea, but what confuses me is the mention of hard links. The patch deals with "synonyms" on a watch_node (i.e. a directory) level - but last time I checked, it was impossible to create a directory hard link.

Could you please describe your use case, so that I can add a test for it?

trespasserw avatar Jan 12 '23 13:01 trespasserw

@trespasserw thanks, that makes things considerably clearer. I did not realise the watcher is not supposed to recurse into files.

I had a dig through my IDE logs back from when I was running into this last year, and I found this example (paths abbreviated):

2022-07-07 08:55:52,957 [ 310826]   WARN - pl.local.NativeFileWatcherImpl - table error: collision at 25197 (new /home/<snip>/share/timezone/Africa/Asmara, existing /home/<snip>/share/timezone/Africa/Addis_Ababa) 
2022-07-07 08:55:53,002 [ 310871]   WARN - pl.local.NativeFileWatcherImpl - Watcher terminated with exit code 3 
2022-07-07 08:55:53,002 [ 310871]   WARN - api.vfs.impl.local.FileWatcher - File watcher failed repeatedly and has been disabled 

Now on the filesystem, I can see that they are indeed files that are hard linked to each other:

$ cd /home/<snip>/share/timezone/Africa/
$ ls -li Addis_Ababa
26479409 -rw-r--r-- 11 miles miles 191 Apr  1  2022 Addis_Ababa
$ ls -li Asmara      
26479409 -rw-r--r-- 11 miles miles 191 Apr  1  2022 Asmara

So the puzzle would seem to be why fsnotifier was trying to watch them in the first place. It's been a while so I don't remember now exactly what I was doing at the time, but I will see if I can come up with a reproducible test case.

mdelah avatar Jan 14 '23 03:01 mdelah

Okay, so the subtlety here is that while the tool declines to watch listed files when scanning a directory, it does allow watch nodes to be created from files when they are given as "roots". And that's exactly what was going on:

watching /home/<snip>/share/timezone/Africa/Addis_Ababa: 468554
registering root: /home/<snip>/share/timezone/Africa/Algiers
watching /home/<snip>/share/timezone/Africa/Algiers: 468555
registering root: /home/<snip>/share/timezone/Africa/Asmara
watching /home/<snip>/share/timezone/Africa/Asmara: 468554
table error: collision at 468554 (new /home/<snip>/share/timezone/Africa/Asmara, existing /home/<snip>/share/timezone/Africa/Addis_Ababa)

Now, these files appear nested under $GOROOT/share so I imagine that's why they were being watched as dependencies, but why GoLand decided to add them as individual roots rather than just watch the whole directory I have no idea. Moreover, somewhere in between the version that I was running last year (2022.1.4) and current (2022.3) this behaviour appears to have changed, because I can no longer reproduce this error. Since it no longer affects me I'm happy enough to decline this PR, though given the number of other people who seem to have run into it perhaps there may be other circumstances where hard linked files end up getting added as roots. Thoughts?

EDIT: I spoke too soon! I am, in fact, still hitting this error in the latest version (2022.3.2):

2023-02-21 15:01:41,720 [  44927]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - table error: collision at 555202 (new /home/<snip>/share/terminfo/2/2621-wl, existing /home/<snip>/share/terminfo/2/2621)
2023-02-21 15:01:42,053 [  45260]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - Watcher terminated with exit code 3

mdelah avatar Jan 14 '23 06:01 mdelah

Can we please get this fixed? It was reported 9 whole years ago, this PR was opened over a year ago, and at least for me, it affects half of my projects.

In case you're not convinced hard links are the issue:

2023-10-14 21:40:29,022 [  17993]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - table error: collision at 16915 (new /usr/lib64/libreoffice/program/__pycache__/officehelper.cpython-311.pyc, existing /usr/lib64/libreoffice/program/__pycache__/officehelper.cpython-311.opt-1.pyc)
2023-10-14 21:40:29,514 [  18485]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - Watcher terminated with exit code 3
2023-10-14 21:40:29,515 [  18486]   INFO - #c.i.o.v.i.l.NativeFileWatcherImpl - Starting file watcher: /local/bin/idea-IU-232.10072.27/bin/fsnotifier
2023-10-14 21:40:29,603 [  18574]   INFO - #c.i.j.n.n.r.NpmRegistryServiceImpl - Found 35 dist-tags, 191 versions for @types/lodash in 762ms
2023-10-14 21:40:29,843 [  18814]   INFO - #c.i.j.n.n.r.NpmRegistryServiceImpl - Found 35 dist-tags, 125 versions for @types/react-dom in 238ms
2023-10-14 21:40:29,857 [  18828]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - table error: collision at 16915 (new /usr/lib64/libreoffice/program/__pycache__/officehelper.cpython-311.pyc, existing /usr/lib64/libreoffice/program/__pycache__/officehelper.cpython-311.opt-1.pyc)
2023-10-14 21:40:29,880 [  18851]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - Watcher terminated with exit code 3
2023-10-14 21:40:29,880 [  18851]   INFO - #c.i.o.v.i.l.NativeFileWatcherImpl - Starting file watcher: /local/bin/idea-IU-232.10072.27/bin/fsnotifier
2023-10-14 21:40:30,192 [  19163]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - table error: collision at 16915 (new /usr/lib64/libreoffice/program/__pycache__/officehelper.cpython-311.pyc, existing /usr/lib64/libreoffice/program/__pycache__/officehelper.cpython-311.opt-1.pyc)
2023-10-14 21:40:30,222 [  19193]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - Watcher terminated with exit code 3
2023-10-14 21:40:30,222 [  19193]   INFO - #c.i.o.v.i.l.NativeFileWatcherImpl - Starting file watcher: /local/bin/idea-IU-232.10072.27/bin/fsnotifier
2023-10-14 21:40:30,652 [  19623]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - table error: collision at 16915 (new /usr/lib64/libreoffice/program/__pycache__/officehelper.cpython-311.pyc, existing /usr/lib64/libreoffice/program/__pycache__/officehelper.cpython-311.opt-1.pyc)
2023-10-14 21:40:30,659 [  19630]   INFO - #c.i.o.a.i.PopupMenuPreloader - 14837 ms since showing to preload popup menu 'Navigate' at 'MainMenu(preload-bgt)' in 41 ms
2023-10-14 21:40:30,678 [  19649]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - Watcher terminated with exit code 3
2023-10-14 21:40:30,678 [  19649]   INFO - #c.i.o.v.i.l.NativeFileWatcherImpl - Starting file watcher: /local/bin/idea-IU-232.10072.27/bin/fsnotifier
2023-10-14 21:40:30,714 [  19685]   INFO - #c.i.o.a.i.PopupMenuPreloader - 14895 ms since showing to preload popup menu 'Code' at 'MainMenu(preload-bgt)' in 99 ms
2023-10-14 21:40:30,797 [  19768]   INFO - #c.i.o.a.i.PopupMenuPreloader - 14979 ms since showing to preload popup menu 'Edit' at 'MainMenu(preload-bgt)' in 182 ms
2023-10-14 21:40:30,977 [  19948]   INFO - #c.i.o.a.i.PopupMenuPreloader - 15158 ms since showing to preload popup menu 'Refactor' at 'MainMenu(preload-bgt)' in 362 ms
2023-10-14 21:40:31,023 [  19994]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - table error: collision at 16915 (new /usr/lib64/libreoffice/program/__pycache__/officehelper.cpython-311.pyc, existing /usr/lib64/libreoffice/program/__pycache__/officehelper.cpython-311.opt-1.pyc)
2023-10-14 21:40:31,048 [  20019]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - Watcher terminated with exit code 3
2023-10-14 21:40:31,049 [  20020]   WARN - #c.i.o.v.i.l.FileWatcher - File watcher failed repeatedly and has been disabled

Looking at those files:

bspot@blackbook ~ $ stat /usr/lib64/libreoffice/program/__pycache__/officehelper.cpython-311.pyc
  File: /usr/lib64/libreoffice/program/__pycache__/officehelper.cpython-311.pyc
  Size: 3125            Blocks: 8          IO Block: 4096   regular file
Device: 259,3   Inode: 2431872     Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-09-24 04:39:37.955106219 +0200
Modify: 2023-09-24 04:39:37.955106219 +0200
Change: 2023-09-24 04:40:01.070105313 +0200
 Birth: 2023-09-24 04:39:37.955106219 +0200
bspot@blackbook ~ $ stat /usr/lib64/libreoffice/program/__pycache__/officehelper.cpython-311.opt-1.pyc
  File: /usr/lib64/libreoffice/program/__pycache__/officehelper.cpython-311.opt-1.pyc
  Size: 3125            Blocks: 8          IO Block: 4096   regular file
Device: 259,3   Inode: 2431872     Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-09-24 04:39:37.955106219 +0200
Modify: 2023-09-24 04:39:37.955106219 +0200
Change: 2023-09-24 04:40:01.070105313 +0200
 Birth: 2023-09-24 04:39:37.955106219 +0200

bspot avatar Oct 14 '23 20:10 bspot

@trespasserw Can you please take a look into this? It still prevents using IntelliJ properly with nix store optimisation:

2023-11-07 13:50:23,823 [   6407]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - table error: collision at 4740 (new /nix/store/7jhz4g4d4is11zg0yg8jjv8zpiisdx70-idea-ultimate-2023.2.4/idea-ultimate/plugins/java/lib/maven-resolver-connector-basic.jar, existing /nix/store/7jhz4g4d4is11zg0yg8jjv8zpiisdx70-idea-ultimate-2023.2.4/idea-ultimate/plugins/DatabaseTools/lib/maven-resolver-connector-basic.jar)
2023-11-07 13:50:23,830 [   6414]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - Watcher terminated with exit code 3
2023-11-07 13:50:23,830 [   6414]   INFO - #c.i.o.v.i.l.NativeFileWatcherImpl - Starting file watcher: /nix/store/7jhz4g4d4is11zg0yg8jjv8zpiisdx70-idea-ultimate-2023.2.4/idea-ultimate/bin/fsnotifier
2023-11-07 13:50:24,143 [   6727]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - table error: collision at 4740 (new /nix/store/7jhz4g4d4is11zg0yg8jjv8zpiisdx70-idea-ultimate-2023.2.4/idea-ultimate/plugins/java/lib/maven-resolver-connector-basic.jar, existing /nix/store/7jhz4g4d4is11zg0yg8jjv8zpiisdx70-idea-ultimate-2023.2.4/idea-ultimate/plugins/DatabaseTools/lib/maven-resolver-connector-basic.jar)
2023-11-07 13:50:24,154 [   6738]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - Watcher terminated with exit code 3
2023-11-07 13:50:24,154 [   6738]   INFO - #c.i.o.v.i.l.NativeFileWatcherImpl - Starting file watcher: /nix/store/7jhz4g4d4is11zg0yg8jjv8zpiisdx70-idea-ultimate-2023.2.4/idea-ultimate/bin/fsnotifier
2023-11-07 13:50:24,490 [   7074]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - table error: collision at 4740 (new /nix/store/7jhz4g4d4is11zg0yg8jjv8zpiisdx70-idea-ultimate-2023.2.4/idea-ultimate/plugins/java/lib/maven-resolver-connector-basic.jar, existing /nix/store/7jhz4g4d4is11zg0yg8jjv8zpiisdx70-idea-ultimate-2023.2.4/idea-ultimate/plugins/DatabaseTools/lib/maven-resolver-connector-basic.jar)
2023-11-07 13:50:24,500 [   7084]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - Watcher terminated with exit code 3
2023-11-07 13:50:24,501 [   7085]   INFO - #c.i.o.v.i.l.NativeFileWatcherImpl - Starting file watcher: /nix/store/7jhz4g4d4is11zg0yg8jjv8zpiisdx70-idea-ultimate-2023.2.4/idea-ultimate/bin/fsnotifier
2023-11-07 13:50:24,867 [   7451]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - table error: collision at 4740 (new /nix/store/7jhz4g4d4is11zg0yg8jjv8zpiisdx70-idea-ultimate-2023.2.4/idea-ultimate/plugins/java/lib/maven-resolver-connector-basic.jar, existing /nix/store/7jhz4g4d4is11zg0yg8jjv8zpiisdx70-idea-ultimate-2023.2.4/idea-ultimate/plugins/DatabaseTools/lib/maven-resolver-connector-basic.jar)
2023-11-07 13:50:24,882 [   7466]   WARN - #c.i.o.v.i.l.NativeFileWatcherImpl - Watcher terminated with exit code 3
2023-11-07 13:50:24,883 [   7467]   WARN - #c.i.o.v.i.l.FileWatcher - File watcher failed repeatedly and has been disabled

I have to disable store optimization globally, which means on my actively used desktop system I'm losing hundreds of gigabytes of space. The error above is just some example - it will fail all the time when the file watchers watches multiple names that link to the same inode. It shouldn't be hard to fix.

pschyska avatar Nov 07 '23 13:11 pschyska

We're seeing this issue with https://devenv.sh, is there any chance this will get fixed?

domenkozar avatar Jun 08 '24 19:06 domenkozar