crawlee icon indicating copy to clipboard operation
crawlee copied to clipboard

sendRequest does not use custom proxies

Open tugkan opened this issue 8 months ago • 3 comments

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/basic (BasicCrawler)

Issue description

If you use the custom proxies like this:

{
   "proxy": {
      "useApifyProxy": false,
      "proxyUrls": [ "https://..." ]
   }
}

The proxyInfo object is not exposed to CrawlerContext. Therefore, sendRequest dispatches requests from naked IPs.

I am not quite sure if this is expected behavior or not. Just wanted to reach out.

Code sample

https://github.com/apify/crawlee/blob/764f99203627b6a44d2ee90d623b8b0e6ecbffb5/packages/basic-crawler/src/internals/basic-crawler.ts#L1419

Package version

3.13.1

Node.js version

22

Operating system

No response

Apify platform

  • [ ] Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

No response

Other context

No response

tugkan avatar Apr 07 '25 20:04 tugkan

There is no proxy on BasicCrawler level, this is expected behavior.

https://crawlee.dev/js/api/basic-crawler/interface/BasicCrawlerOptions

Expected, but I also get confused by this quite often. Maybe we should rework this.

B4nan avatar Apr 08 '25 06:04 B4nan

I guess there's no harm in pulling ProxyInfo all the way to BasicCrawler, is there? Even if there's some use case for BasicCrawler where proxies don't make any sense, the overhead of keeping a ProxyInfo for each run of the request handler should be negligible.

janbuchar avatar Apr 08 '25 08:04 janbuchar

Yeah, I would be for moving the proxyConfiguration option to the basic crawler level too, it kinda makes sense if we tell people to use sendRequest to have a native support for proxies on that level.

B4nan avatar Apr 08 '25 08:04 B4nan