fdir icon indicating copy to clipboard operation
fdir copied to clipboard

symlinks can cause duplicate entries

Open 43081j opened this issue 9 months ago • 2 comments

Given the following:

import mock from "mock-fs";
import { fdir } from "./dist/index.js";

mock({
  "/sym/linked": {
    "file-1": "file contents",
  },
  "/some/dir": {
    dirSymlink: mock.symlink({
      path: "/sym/linked",
    }),
    fileSymlink: mock.symlink({
      path: "/sym/linked/file-1",
    }),
  },
});
const api = new fdir().withSymlinks().crawl("/");
const files = await api.sync();
console.log(files);

files will contain the same entry 3 times (/sym/linked/file-1).

if you change this to withPromise (async), it will contain the same entry 2 times.

so there are two problems here:

  1. inconsistency between async and sync
  2. duplicating symlink resolutions

inconsistent results (async vs sync)

This is because async roughly visits the FS breadth-first (i.e. it visits all the top-level files, then 2nd level, and so on).

because of that, it visits /sym/linked before visiting /some/dir/dirSymlink. that means this code is hit and /sym/linked isn't queued up again (because the callback is never called thanks to this early return).

when we switch to sync mode, we visit the FS depth-first.

this means we visit /some/dir/dirSymlink before we visit /sym/linked. of course, that means we haven't yet visited it, so we queue it up to be visited.

meanwhile, when we do eventually visit /sym/linked normally (i.e. not because of the symlink), we don't check visited at all and push the result.

because of all of this, we get 2 copies of /sym/linked

duplicating symlink resolutions

the race condition explained above causes one of the duplicates. but that code isn't even really meant to be dealing with dupes, it exists to avoid infinite loops.

so this points at a deeper problem we should fix in the walker most likely, rather than trying to fix it in the symlink resolution code

basically, if we visit symlinks before the things they link to, we push them twice right now.

somewhere, we should probably be checking visited before traversing directories in the walker (visited is only directories afaict)

43081j avatar Mar 24 '25 22:03 43081j

Thank you for the detailed bug report. Will take a look at this soon. Feel free to open a PR if you have a working fix for this.

thecodrr avatar Mar 25 '25 00:03 thecodrr

i don't quite have a fix for it yet but will try take a look this week unless you beat me to it

if anyone else wants to take a look too (or yourself), i think my suggestion is that when we encounter a directory, we always check if it is already in visited. this solves one half of the problem

the other half of the problem is that you may have a symlink to a file, and those don't get tracked in visited. so this can happen:

  • visit /foo/symlink-1 (which resolves to /bar/file-1)
  • visit /bar/file-1

we now have ['/bar/file-1', '/bar/file-1']. but in async, and if we visited them in reverse, we'd have one entry.

this suggests we need to de-dupe in pushFile, but maybe that will affect performance?

43081j avatar Mar 25 '25 09:03 43081j