Scrapfly SDK

Installation

pip install scrapfly-sdk

You can also install extra dependencies

pip install "scrapfly-sdk[seepdup]" for performance improvement
pip install "scrapfly-sdk[concurrency]" for concurrency out of the box (asyncio / thread)
pip install "scrapfly-sdk[scrapy]" for scrapy integration
pip install "scrapfly-sdk[all]" Everything!

For use of built-in HTML parser (via ScrapeApiResponse.selector property) additional requirement of either parsel or scrapy is required.

For reference of usage or examples, please checkout the folder /examples in this repository.

Get Your API Key

You can create a free account on Scrapfly to get your API Key.

Migration

Migrate from 0.7.x to 0.8

asyncio-pool dependency has been dropped

scrapfly.concurrent_scrape is now an async generator. If the concurrency is None or not defined, the max concurrency allowed by your current subscription is used.

    async for result in scrapfly.concurrent_scrape(concurrency=10, scrape_configs=[ScrapConfig(...), ...]):
        print(result)

brotli args is deprecated and will be removed in the next minor. There is not benefit in most of case versus gzip regarding and size and use more CPU.

What's new

0.8.x

Better error log
Async/Improvement for concurrent scrape with asyncio
Scrapy media pipeline are now supported out of the box

python-scrapfly
python-scrapfly copied to clipboard

Metadata

Scrapfly SDK

Installation

Get Your API Key

Migration

Migrate from 0.7.x to 0.8

What's new

0.8.x

← Metadata

Owner

Metadata

python-scrapfly python-scrapfly copied to clipboard

Metadata

Scrapfly SDK

Installation

Get Your API Key

Migration

Migrate from 0.7.x to 0.8

What's new

0.8.x

← Metadata

Owner

Metadata

python-scrapfly
python-scrapfly copied to clipboard