crawlee icon indicating copy to clipboard operation
crawlee copied to clipboard

Node crash on Crawlee running fs.stat on a request_queue lock file

Open Clearmist opened this issue 6 months ago • 4 comments

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/core

Issue description

The crawler, while running, will randomly crash Node. I tried using the experimental option of disabling locking, but it still happens. I doubt this is a permission issue because my user has write permission to this entire directory structure and I've also tried running as administrator.

I'm okay if I don't get the root of this issue fixed. At the least I'd like to know where I can put a try/catch so this error doesn't crash Node and the crawler can continue.

Obviously Node is trying to get file information from a lock file and dies.

node:internal/process/promises:289
            triggerUncaughtException(err, true /* fromPromise */);
            ^

[Error: EPERM: operation not permitted, stat 'C:\Users\{username}\Repositories\crawler-app\storage\request_queues\2fdd8a2d-a180-48a1-9f36-28d5a2793b36\y0jxi0Gs1ISlI1y.json.lock'] {
  errno: -4048,
  code: 'EPERM',
  syscall: 'stat',
  path: 'C:\\Users\\{username}\\Repositories\\crawler-app\\storage\\request_queues\\2fdd8a2d-a180-48a1-9f36-28d5a2793b36\\y0jxi0Gs1ISlI1y.json.lock'
}
  1. Start a Cheerio crawler instance with a custom request queue name on a Windows machine.

Code sample

import { randomUUID } from 'node:crypto';
import { app } from 'electron';

const alias = randomUUID();

const address = 'https://{testing-address}';

const config = new Configuration({
  storageClientOptions: {
    localDataDirectory: path.join(app.getPath('userData'), 'crawlerStorage'),
  },
});

const requestQueue = await RequestQueue.open(alias);

await requestQueue.addRequest({ url: address });

const options = {
  experiments: {
    // Request locking is enabled by default since 3.10.0.
    // I've tried setting it to false and it still locks request json files.
    requestLocking: false,
  },
  requestQueue,
  ...
};

const crawler = new CheerioCrawler(options, config);

await crawler.run();

Package version

3.11.1

Node.js version

20.10.0

Operating system

Windows 10

Apify platform

  • [ ] Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

3.11.2-beta.17

Other context

No response

Clearmist avatar Aug 07 '24 15:08 Clearmist