crawlee
crawlee copied to clipboard
Allow passing in a custom request adapter to HttpCrawler and BasicCrawler
Which package is the feature request for? If unsure which one to select, leave blank
@crawlee/http (HttpCrawler), @crawlee/basic-crawler (BasicCrawler)
Feature
Add an option to provide a custom http adapter client to HttpCrawler and BasicCrawler and use it in _requestAsBrowser
and sendRequest
functions.
Motivation
Currently HttpCralwer and BasicCrawler are hard-wired to use the gotScraping
import instance from the got-scraping
lib. It makes it really inconvenient to extend default browser mimicking behaviour if, for example, you'd like to alter the tls hooks provided in that lib. Or if you'd like to use a different lib altogether, like axios
. You can modify gotOptions
in the preNavigationHook
, but the got-scraping
Got
instance will still be used (and only after the hook), so you can't do something as convenient as:
import { gotScraping } from "got-scraping"
const newInstance = gotScraping.extend({
....
})
// pas your new instance to the crawler
...
And, obviously can't switch request libs to your liking.
Ideal solution or implementation, and any additional constraints
Add something like httpAdapter
property to BasicCrawlerOptions
that will be default-initialized to gotScraping
and assign it to the public modifiable field in the constructor. Then use this field in _requestAsBrowser
, the only place gotScraping
is currently used to make requests in HttpCrawler, and sendRequest
- BasicCrawler. The object passed to httpAdapter
will have to adhere to some common interface, that will likely, but not necessarily resemble Got
's interface (only the part of it that is used by crawlee, it should be minimal for easy ad-hoc implementation). You can also add more customization by moving httpAdapter
property to HttpCrawlerOptions
and add sendRequestAdapter
to BasicCrawlerOptions
; HttpCrawler will then default-assign httpAdapter
to sendRequestAdapter
, but it will be possible to customize both. It would also make sense to allow the callers of sendRequest
to override the adapter on every call
Alternative solutions or implementations
No response
Other context
No response