crawlee-python icon indicating copy to clipboard operation
crawlee-python copied to clipboard

feat: add `ImpitHttpClient` http-client client using the `impit` library

Open Mantisus opened this issue 8 months ago • 1 comments

Description

  • add ImpitHttpClient http-client client using the impit library

Issues

  • Relates: #1079

Testing

Added tests for ImpitHttpClient. ImpitHttpClient is enabled for all tests using http-client

Mantisus avatar Apr 14 '25 14:04 Mantisus

For now, I suggest adding impit as an additional dependency, as it still needs some tweaking before it's ready to replace httpx.

Awaiting a decision - https://github.com/apify/impit/issues/123

Mantisus avatar Apr 14 '25 15:04 Mantisus

Python binding Impit has all the basic functionality to integrate into Crawlee.

The _get_client method is implemented based on ImpitHttpClient. However, this looks inefficient, especially when working without a proxy, but using a SessionPool of size greater than 1, because the client will be created anew for each request. I think we should improve this on the impit side. @barjin, maybe you'll have some ideas.

Replacing httpx with impit as the main client, I propose to do in a separate PR

Mantisus avatar Jul 07 '25 22:07 Mantisus

Not merging this PR until we resolve the test issue

Mantisus avatar Jul 08 '25 12:07 Mantisus