fileconveyor icon indicating copy to clipboard operation
fileconveyor copied to clipboard

FileConveyor's inotify FSMonitor breaks when it finds temporary files in it's path.

Open j0rd opened this issue 12 years ago • 19 comments

I'm having issues with fileconveyor getting hung up on vim temporary files. I know I could exclude these from my paths, but ideally the daemon would not crash when a file goes away between the time it finds it and attempts to do something with it.

Better error handling needs to exist, so that the daemon never dies. It this involves wrapping the entire while(1) in a try catch and spitting out errors then restarting, then so be it.

It should also provide useful notification (perhaps via email) that it encountered a problem. Because currently when the server gets hung....essentially your CDN will start to fall behind and your site could end up being totally fubar'd.

j0rd avatar Sep 23 '11 12:09 j0rd

I just ran into this again. It happens when I'm editing a with vim under the file conveyor path

Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python2.6/threading.py", line 532, in **bootstrap_inner self.run() File "/usr/lib/pymodules/python2.6/pyinotify.py", line 1356, in run self.loop() File "/usr/lib/pymodules/python2.6/pyinotify.py", line 1342, in loop self.process_events() File "/usr/lib/pymodules/python2.6/pyinotify.py", line 1134, in process_events watch_.proc_fun(revent) # user processings File "/usr/lib/pymodules/python2.6/pyinotify.py", line 811, in __call** return _ProcessEvent.__call__(self, event) File "/usr/lib/pymodules/python2.6/pyinotify.py", line 544, in **call** return meth(event) File "/home/www/domain.tld/fileconveyor/wimleers-fileconveyor-d1c55b8/code/fsmonitor_inotify.py", line 233, in process_IN_CREATE self.__update_pathscanner_db(event.pathname, FSMonitor.CREATED) File "/home/www/domain.tld/fileconveyor/wimleers-fileconveyor-d1c55b8/code/fsmonitor_inotify.py", line 205, in __update_pathscanner_db st = os.stat(pathname) OSError: [Errno 2] No such file or directory: '/home/www/domain.tld/htdocs/sites/all/themes/theme_zen/js/.functions.js.swx'

j0rd avatar Sep 23 '11 15:09 j0rd

It should most definitely not crash. It should simply detect the deletion and be done with it.

wimleers avatar Jan 20 '12 10:01 wimleers

To test, open fileconveyor under a path.

Inside that path, open up a file with vim. Save and close the file.

I stopped using fileconveyor because it was much too buggy when I was playing around with it for use on a live server. Perhaps this has been fixed by now, but in my past experience this process I've mentioned was causing a problem.

j0rd avatar Jan 20 '12 13:01 j0rd

I just tested it and I cannot get it to crash:

2012-01-20 07:36:42,865 - Arbitrator                - WARNING  - Synced: '/Users/wimleers/Desktop/watch/.bar.swp' (CREATED).
2012-01-20 07:36:43,271 - Arbitrator                - WARNING  - Synced: '/Users/wimleers/Desktop/watch/bar' (CREATED).
2012-01-20 07:36:48,290 - Arbitrator                - WARNING  - Synced: '/Users/wimleers/Desktop/watch/.bar.swp' (DELETED).
2012-01-20 07:36:49,307 - Arbitrator                - WARNING  - Synced: '/Users/wimleers/Desktop/watch/bar' (DELETED).
2012-01-20 07:37:03,937 - Arbitrator                - WARNING  - Synced: '/Users/wimleers/Desktop/watch/.baz.swp' (CREATED).
2012-01-20 07:37:07,363 - Arbitrator                - WARNING  - Synced: '/Users/wimleers/Desktop/watch/.baz.swp' (DELETED).
2012-01-20 07:37:07,369 - Arbitrator                - WARNING  - Synced: '/Users/wimleers/Desktop/watch/baz' (CREATED).

Either it has been fixed or I cannot reproduce it on OS X.

If you use pip, you can now easily install and uninstall File Conveyor and its dependencies, by doing this:

pip install -e git+https://github.com/wimleers/fileconveyor@master#egg=fileconveyor

Test with something like this: https://gist.github.com/1647418. Then you should be able to either reproduce the problem easily or (hopefully) not.

wimleers avatar Jan 20 '12 13:01 wimleers

Any news, @j0rd?

wimleers avatar Oct 14 '12 09:10 wimleers

No, I stopped using fileconveyor a long time ago, sorry.

j0rd avatar Oct 14 '12 11:10 j0rd

Np :)

wimleers avatar Oct 14 '12 16:10 wimleers

Just discovered yesterday then rsyncing into a folder 'watched' by fileconveyor causes the problem mentioned above: temporary files cause FileConveyor to crash.

ericssmith avatar Oct 31 '12 12:10 ericssmith

@ericssmith Is this with 0.3-dev? Temporary files should work just fine. Also see what I said a few comments higher, "I just tested it […]".

wimleers avatar Oct 31 '12 12:10 wimleers

Just to let you know, I got this issue with vim on an RHEL box too. Was some time ago (3 months ish) but still relevant I think.

chrisivens avatar Oct 31 '12 13:10 chrisivens

We are going to repeat the failing procedure and post the results here today.

ericssmith avatar Oct 31 '12 14:10 ericssmith

Unfortunately, we don't have time to recreate the error, but here is what the log looked like:

2012-10-30 15:25:01,953 - Arbitrator.FSMonitor - INFO - Detected 'CREATED' event for '/var/www/vhosts/mainsite/sites/default/files/.pumpkin_bread.jpg.M2B4iO' through inotify (for monitored path '/var/www/vhosts/mainsite/sites/default'). 2012-10-30 15:25:02,151 - Arbitrator - INFO - Discover queue -> pipeline queue: '/var/www/vhosts/mainsite/sites/default/files/.pumpkin_bread.jpg.M2B4iO'. 2012-10-30 15:25:02,161 - Arbitrator - INFO - Pipeline queue -> filter queue: '/var/www/vhosts/mainsite/sites/default/files/.pumpkin_bread.jpg.M2B4iO'. 2012-10-30 15:25:02,164 - Arbitrator - INFO - Filtering: dropped '/var/www/vhosts/mainsite/sites/default/files/.pumpkin_bread.jpg.M2B4iO' because it no longer exists.

This is where it died. The error itself is not in the log. Sorry that we won't be able to reproduce it. but will report it if we run into it again. However we are going to set the temp files for rsync to not be in the receiving directory, so that seems unlikely.

(and thanks for your work on this. It goes braindead occasionally, but otherwise works as advertised)

ericssmith avatar Oct 31 '12 17:10 ericssmith

Hmm … that's very odd. That means it actually died just after dropping a file (i.e. dropping it from the queue of files to deal with). Very odd.

Did you try setting File Conveyor's logging level to DEBUG? That should help.


I know that it's imperfect, and not as reliable as it should be, but with every edge case that I close, we're getting closer to stability :)

wimleers avatar Nov 01 '12 15:11 wimleers

To duplicate the bug. I'd suggest making a small script which creates and deletes random files then unleash it under a fileconveyor path.

If it dies, you've found my bug.

j0rd avatar Nov 01 '12 16:11 j0rd

I actually did just that but couldn't get it to crash at all.

wimleers avatar Nov 16 '12 12:11 wimleers

Hello there! I have reproduced this error, but I think it happens when there is a temporary file at one time, but then that temporary file is gone when it attempts to transfer the file.

2012-11-20 14:17:46,826 - Arbitrator - WARNING - Synced: '/var/www/vhosts/xxxxx/sites/default/files/IMG_9738-1.JPG' (DELETED). Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib64/python2.6/threading.py", line 532, in bootstrap_inner self.run() File "/usr/lib/python2.6/site-packages/pyinotify.py", line 1410, in run self.loop() File "/usr/lib/python2.6/site-packages/pyinotify.py", line 1396, in loop self.process_events() File "/usr/lib/python2.6/site-packages/pyinotify.py", line 1195, in process_events watch_.proc_fun(revent) # user processings File "/usr/lib/python2.6/site-packages/pyinotify.py", line 840, in __call return _ProcessEvent.call(self, event) File "/usr/lib/python2.6/site-packages/pyinotify.py", line 571, in call return meth(event) File "/usr/local/bin/fileconveyor/fsmonitor_inotify.py", line 253, in process_IN_MODIFY self.__update_pathscanner_db(event.pathname, FSMonitor.MODIFIED) File "/usr/local/bin/fileconveyor/fsmonitor_inotify.py", line 205, in __update_pathscanner_db st = os.stat(pathname) OSError: [Errno 2] No such file or directory: '/var/www/vhosts/xxxxx/sites/default/files/.2012-11-12 03.32.06.jpg.PnN0nx'

This happens very intermittently when a file is RSYNC-ed into the path that fileconveyor is monitoring, when that file is gone when it gets to transfering it or something like that.

I have RSYNCed many files into the path and it then picks them up and transfers them fine, it seems to be only this rareish condition that shuts down fileconveyor.

fringedgentian avatar Nov 20 '12 21:11 fringedgentian

Aha! Now that is interesting :) Maybe it's happening on a system with a relatively workload, meaning that the inotify events are somewhat lagging behind, in which case this could indeed happen. FSMonitorInotifyProcessEvent.__update_pathscanner_db() should be checking whether the file still exists (os.path.exists(input_file)), much like Arbitrator.__process_filter_queue() does. Could you make that simple change and check whether that fixes the problem?

wimleers avatar Nov 22 '12 22:11 wimleers

Hello, it has been a while but I wanted to update you. It turned out that even if I checked if the file exists with os.path.exists(pathname) right before the line os.stat(pathname), it would still happen that a temporary file could be deleted after it had gone past the check but before it got to os.stat, causing an error.

So what I ended up doing was ignoring the error, here is the code if you wanted to improve upon it. This is in fsmonitor_inotify.py:

    (path, filename) = os.path.split(pathname)
    if event_type == FSMonitor.DELETED:
        # Build tuple for deletion of row in PathScanner's DB.
        t = (path, filename)
        self.fsmonitor_ref.pathscanner_files_deleted.append(t)
    else:
        try:
            # Build tuple for PathScanner's DB of the form (path, filename,
            # mtime), with mtime = -1 when it's a directory.
            st = os.stat(pathname)
            is_dir = stat.S_ISDIR(st.st_mode)
            if not is_dir:
                mtime = st[stat.ST_MTIME]
                t = (path, filename, mtime)
            else:
                t = (path, filename, -1)
            # Update PathScanner's DB.
            if event_type == FSMonitor.CREATED:
                self.fsmonitor_ref.pathscanner_files_created.append(t)
            else:
                self.fsmonitor_ref.pathscanner_files_modified.append(t)
        except OSError, e:
            print "Caught a OSError, probably a file doesn't exist"
            print pathname

fringedgentian avatar Dec 19 '12 22:12 fringedgentian

Pull request submitted after testing and verifying this change has solved the issue.

https://github.com/wimleers/fileconveyor/pull/144

jstarcher avatar May 06 '13 18:05 jstarcher