Enable switching HTTP client in `parseSitemap`
The parseSitemap helper function does quite a lot of crawling internally. Currently, it's hardcoded to use got-scraping for all HTTP requests to pull the sitemap files. We're planning to phase out got-scraping with Crawlee v4.
It would only make sense for parseSitemap to accept httpClient option like the crawler instances do.
Motivation
Impit is a more customizable HTTP client than got-scraping.
Ideal solution or implementation, and any additional constraints
Fairly simple, add one parameter and call HttpClient.stream instead of got-scraping.stream
Alternative solutions or implementations
No response
Other context
No response
parseSitemap lives in @crawlee/utils, HttpClient in @crawlee/core. @crawlee/utils likely shouldn't depend on the core package. We'll likely have to extract HttpClient into a separate package (or utils?).
Closed by #3306