spider-py
                                
                                 spider-py copied to clipboard
                                
                                    spider-py copied to clipboard
                            
                            
                            
                        Spider ported to Python
spider-py
The spider project ported to Python.
Getting Started
- pip install spider_rs
import asyncio
from spider_rs import Website
async def main():
    website = Website("https://choosealicense.com")
    website.crawl()
    print(website.get_links())
asyncio.run(main())
View the examples to learn more.
Development
Install maturin pipx install maturin and python.
- maturin develop
Benchmarks
View the benchmarks to see a breakdown between libs and platforms.
Test url: https://espn.com
| libraries | pages | speed | 
|---|---|---|
| spider(rust): crawl | 150,387 | 1m | 
| spider(nodejs): crawl | 150,387 | 153s | 
| spider(python): crawl | 150,387 | 186s | 
| scrapy(python): crawl | 49,598 | 1h | 
| crawlee(nodejs): crawl | 18,779 | 30m | 
The benches above were ran on a mac m1, spider on linux arm machines performs about 2-10x faster.
Issues
Please submit a Github issue for any issues found.