python-scrapfly
python-scrapfly copied to clipboard
Scrapfly Python SDK for headless browsers and proxy rotation
Scrapfly SDK
Installation
pip install scrapfly-sdk
You can also install extra dependencies
-
pip install "scrapfly-sdk[seepdup]"
for performance improvement -
pip install "scrapfly-sdk[concurrency]"
for concurrency out of the box (asyncio / thread) -
pip install "scrapfly-sdk[scrapy]"
for scrapy integration -
pip install "scrapfly-sdk[all]"
Everything!
For use of built-in HTML parser (via ScrapeApiResponse.selector
property) additional requirement of either parsel or scrapy is required.
For reference of usage or examples, please checkout the folder /examples
in this repository.
Get Your API Key
You can create a free account on Scrapfly to get your API Key.
Migration
Migrate from 0.7.x to 0.8
asyncio-pool dependency has been dropped
scrapfly.concurrent_scrape
is now an async generator. If the concurrency is None
or not defined, the max concurrency allowed by
your current subscription is used.
async for result in scrapfly.concurrent_scrape(concurrency=10, scrape_configs=[ScrapConfig(...), ...]):
print(result)
brotli args is deprecated and will be removed in the next minor. There is not benefit in most of case versus gzip regarding and size and use more CPU.
What's new
0.8.x
- Better error log
- Async/Improvement for concurrent scrape with asyncio
- Scrapy media pipeline are now supported out of the box