crawlee
crawlee copied to clipboard
fix: `EnqueueStrategy.All` erroring with links using unsupported protocols
This changes EnqueueStrategy.All to filter out non-http and non-https URLs (mailto: links were causing the crawler to error).
Let me know if there's a better fix or if you want me to change something.
Thanks!
Request failed and reached maximum retries. Error: Received one or more errors
at _ArrayValidator.handle (/path/to/project/node_modules/@sapphire/shapeshift/src/validators/ArrayValidator.ts:102:17)
at _ArrayValidator.parse (/path/to/project/node_modules/@sapphire/shapeshift/src/validators/BaseValidator.ts:103:2)
at RequestQueueClient.batchAddRequests (/path/to/project/node_modules/@crawlee/src/resource-clients/request-queue.ts:340:36)
at RequestQueue.addRequests (/path/to/project/node_modules/@crawlee/src/storages/request_provider.ts:238:46)
at RequestQueue.addRequests (/path/to/project/node_modules/@crawlee/src/storages/request_queue.ts:304:22)
at attemptToAddToQueueAndAddAnyUnprocessed (/path/to/project/node_modules/@crawlee/src/storages/request_provider.ts:302:42)
at RequestQueue.addRequestsBatched (/path/to/project/node_modules/@crawlee/src/storages/request_provider.ts:319:37)
at RequestQueue.addRequestsBatched (/path/to/project/node_modules/@crawlee/src/storages/request_queue.ts:309:22)
at enqueueLinks (/path/to/project/node_modules/@crawlee/src/enqueue_links/enqueue_links.ts:384:2)
at browserCrawlerEnqueueLinks (/path/to/project/node_modules/@crawlee/src/internals/browser-crawler.ts:777:21)
@stefansundin do you plan to finish this? I'd rather not merge such change without any added tests
Hi @B4nan. I started writing a test but I had some more important work come up that took priority.
I may be able to finish it next week.
If you prefer then we can close this PR and open an issue instead.