watchdog icon indicating copy to clipboard operation
watchdog copied to clipboard

PollingObserver drops files

Open esemwy opened this issue 3 years ago • 4 comments

I'm using the polling observer, because the eventual implementation will need to watch an NFS mount, but I'm already dropping files when I synthetically generate data. I've stripped down my code to the bare minimum, but still up to half the files get ignored.

I'm including my stripped down example and my test file generator. The generator creates lorem ipsum text of a couple KB. Suggested workarounds would be welcome.

I'm running 2.1.3 from pypi.

filehandler.py

from watchdog.observers.polling import PollingObserverVFS as Observer
from watchdog.events import FileSystemEventHandler as Handler
import os, sys, time, logging
import traceback, time
from pathlib import Path

logger = logging.getLogger('filewatcher')

class BaseHandler(Observer, Handler):
    def __init__(self, directory):
        self._directory = directory
        super().__init__(stat=os.stat, listdir=os.listdir, polling_interval=1)
        self.schedule(self, directory, recursive=False)

        root = logging.getLogger()
        root.setLevel(logging.DEBUG)

        fmt = '%(name)s[%(process)s]: %(levelname)-5s | %(message)s'
        formatter = logging.Formatter(fmt=fmt)

        handler = logging.StreamHandler(sys.stderr)
        handler.setLevel(logging.DEBUG)
        handler.setFormatter(formatter)

        root.addHandler(handler)

    def start(self):
        for name in os.listdir(self._directory):
            path = os.path.join(self._directory, name)
            logger.debug(f'Handling {path} before start.')
            self.handle_file(path)
        super().start()

    def on_created(self, event):
        if event.is_directory or event.is_synthetic:
            logger.warn('Unexpected file %s', event.src_path)
            return
        logger.debug(f"{event.src_path} created")
        self.handle_file(event.src_path)

class HandleFiles(BaseHandler):
    def handle_file(self, source):
        logger.info('removing file %s', source)
        os.unlink(source)

def main():
   handler = HandleFiles('/tmp/newfiles')
   handler.start()
   try:
       while True:
            # Check in on the status of the directory being observed
            time.sleep(10)
   except:
       for line in traceback.format_exc().split('\n'):
           logger.critical(line)
       pass
   finally:
       handler.stop()
       handler.join()

if __name__ == "__main__":
    main()

genfiles.py

#!/usr/bin/env python3
from uuid import uuid4 as UUID
from lorem import paragraph
from pathlib import Path
from itertools import islice
import argparse, os

def writable_dir(name):
    p = Path(name)
    if not p.exists():
        raise argparse.ArgumentTypeError("{0} is not a valid path".format(p))
    if not p.is_dir():
        raise argparse.ArgumentTypeError("{0} is not a directory".format(p))
    if not os.access(p, os.R_OK|os.W_OK):
        raise argparse.ArgumentTypeError("{0} permission denied".format(p))
    return str(p)

parser = argparse.ArgumentParser('gendata')
parser.add_argument('directory', type=writable_dir, help="Directory to drop files")
parser.add_argument('--number','-n', default=100, type=int, help="Number of random files")
args = parser.parse_args()

dir = Path(args.directory)
for _ in range(args.number):
    filename = dir / str(UUID())
    with filename.open(mode='w') as outfile:
        for p in islice(paragraph(10),10):
            print(p, end="\n\n", file=outfile)

esemwy avatar Jul 07 '21 18:07 esemwy

@esemwy Thank you for performing this test! I always check the open issues before using the library, so I have tried to reproduce your errors on Windows with simple PollingObserver. Running file generator at 10,000 the script seems to leave no files in the folder - they all get deleted. What parameters did you use?

tonysepia avatar Feb 13 '22 11:02 tonysepia

Just tried with 125,000 files generated. None left behind!

tonysepia avatar Feb 13 '22 11:02 tonysepia

It’s been a couple versions since I opened the issue. I will say, I was running on Linux, not Windows. Otherwise, everything was configured as you see above. Our solution was to rename files into place to avoid any possible race condition.

esemwy avatar Feb 13 '22 13:02 esemwy

Thank you for clarifying!

tonysepia avatar Feb 13 '22 13:02 tonysepia