sitediff icon indicating copy to clipboard operation
sitediff copied to clipboard

sitediff fails with "Not a directory @ apply2files" if crawl only produces one page

Open fgerards opened this issue 2 years ago • 6 comments

Sitediff fails to compare 2 single-paged URLs/sites: the before/after entries in the snapshots directory are files, not directories containing other entries, so this should also be taken into account

Error occurs on Linux ubuntu laptop and on Macbook Air 2020 M1 with MacOS Ventura 13.1 when installing sitediff via homebrew in latest version

fgerards avatar Dec 16 '22 09:12 fgerards

any follow up comment on this?

jgam avatar Mar 01 '23 05:03 jgam

Which version of Ruby are you using? Can you provide an example of what you're doing? On what line does the error happen?

kirk-brown-ew avatar Mar 02 '23 03:03 kirk-brown-ew

using ruby 3.1.3

when running sitediff crawl it simply finds only a single path that is '/' and outputs the error above

jgam avatar Mar 02 '23 03:03 jgam

for in stnce this is the how output looks like

Jimmyui-MacBook-Pro:~ jimmygam$ sitediff init https://mentree.club/
[success] Created /Users/jimmygam/sitediff/sitediff.yaml
Jimmyui-MacBook-Pro:~ jimmygam$ sitediff crawl
Reading config file: /Users/jimmygam/sitediff/sitediff.yaml
Visited https://mentree.club/, cached.
[error] Unknown parsing error for https://mentree.club/: Not a directory @ apply2files - sitediff/snapshot/before/timestamp  From page: {:referrer=>"/"}

1 page(s) found.
[done] Created /Users/jimmygam/sitediff/paths.txt.

jgam avatar Mar 02 '23 03:03 jgam

We've been able to reproduce this issue by:

  1. Creating a static web page with no links.
  2. Running sitediff crawl.
  3. Adding a link to the page with either a reference to within the page or creating another page at the same level.
  4. Running sitediff crawl.

The first crawl creates the before site as a file. The second crawl wants to create before as a directory.

The solution that we see is to remove the before directory and re-running the crawl.

kirk-brown-ew avatar Mar 04 '23 21:03 kirk-brown-ew

Adding on to @kirk-brown-ew's comment, I've been able to resolve this issue entirely by running the following before running the crawl command:

mkdir -p sitediff/snapshot/before ; mkdir -p sitediff/snapshot/after

Your path before the /snapshot directory may differ. I'm running this via the Docker image.

tjhaygood avatar Nov 09 '23 20:11 tjhaygood