crawlee-python Chained Requests

How Crawlee can be used when requests needs to be sent in sequence like in most ASP.Net applications. Scrapy handle these cases using inline requests without CALLBACK.

e.g here couple of sequenced requests needs to be done to get the desired.

Jul 21 '24 14:07 Ehsan-U

Hello, and thank you for your interest in Crawlee! I assume (correct me if I'm wrong) that you need to perform additional HTTP requests on each page you visit. Would the send_request helper work for you?

@crawler.router.default_handler
async def handler(context: BeautifulSoupCrawlingContext) -> None:
  response = await context.send_request(url="/foo", method="post")
  
  # parse JSON
  response_json = json.loads(response.read()) 
  
  # ...or HTML
  response_soup = BeautifulSoup(response.read())

Jul 22 '24 13:07 janbuchar

Thanks. BasicCrawlingContext doesn't give access to response so the proposed solution will only work with extended class like BeautifulSoupCrawlingContext

New to crawlee, correct me if I'm wrong.

Jul 22 '24 14:07 Ehsan-U

You can actually do this with BasicCrawlingContext as well, it also provides the send_request helper, and the response is returned from that. Do not confuse this with context.http_response which is the response for the original request URL, fetched before the request_handler was invoked.

Jul 22 '24 15:07 janbuchar

@Ehsan-U is your question answered? If so, please close the issue :slightly_smiling_face:

Jul 23 '24 08:07 janbuchar