browsertrix-crawler
browsertrix-crawler copied to clipboard
unknown error
Trying a quick test with a simple website :
sudo docker run -v $PWD/crawls:/crawls/ -it webrecorder/browsertrix-crawler crawl --url http://info.cern.ch/ --generateWACZ --text --collection test
result in :
Storing state in memory
pages/pages.jsonl creation failed [Error: ENOENT: no such file or directory, mkdir '/crawls/collections/test/pages'] {
errno: -2, 2022-06-29 09:54:51.182 (running for 1.0 seconds)
code: 'ENOENT', 0 (100.00%), errors: 0 (0.00%)
syscall: 'mkdir',s (@ 0 pages/second)
path: '/crawls/collections/test/pages'
}= Workers: 0
pages/pages.jsonl append failed TypeError: Cannot read properties of null (reading 'writeFile')
at Crawler.writePage (/app/crawler.js:836:26)
at Crawler.crawlPage (/app/crawler.js:348:18)r 6.0 seconds)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at async /app/node_modules/puppeteer-cluster/dist/util.js:63:24
at async Object.timeoutExecute (/app/node_modules/puppeteer-cluster/dist/util.js:54:20)
at async Worker.handle (/app/node_modules/puppeteer-cluster/dist/Worker.js:48:22)
at async Cluster.doWork (/app/node_modules/puppeteer-cluster/dist/Cluster.js:250:24)
== Start: 2022-06-29 09:54:50.180
== Now: 2022-06-29 09:54:56.902 (running for 6.7 seconds)
== Progress: 1 / 1 (100.00%), errors: 0 (0.00%)
== Remaining: 0.0 ms (@ 0.15 pages/second)
== Sys. load: 90.6% CPU / 35.7% memory
== Workers: 1
#0 IDLE
Waiting to ensure pending data is written to WARCs...
Generating WACZ
Crawl failed
[Error: ENOENT: no such file or directory, scandir '/crawls/collections/test/archive'] {
errno: -2,
code: 'ENOENT',
syscall: 'scandir',
path: '/crawls/collections/test/archive'
}
failed
Hm, it seems like pywb has likely failed to launch.. Can you try running with --logging pywb,stats?
Are you using a local build or one of the released images?
Closing as can't really repro this. Leave a comment if it is still happening.