Results 87 comments of David Hicks

I'll take a look at those last few store finders that are needing a Playwright approach to do one or more of: - Intercept API calls and extract API keys...

This latest commit should fix imports. The Python community doesn't like introspection of the file system to automatically load modules and import classes from those modules.

I'll rewrite detection and extraction methods to always use Playwright, and to change the focus towards intercepting and reading requests before the browser sends them, rather than parsing HTML output...

Example from Where2GetItSpider: ``` @spider office_depot addr:city CEDAR RAPIDS addr:country US addr:postcode 52402 addr:state IA addr:street_address 327 COLLINS ROAD NE brand Office Depot brand:wikidata [Q1337797](https://www.wikidata.org/wiki/Q1337797) name Officedepot nsi_id officedepot-8bfff1 phone...

`checkers_rallys_us` also needs this new Yext storefinder implemented: ``` curl 'https://locations.checkers.com/search?q=41.875562,-87.624421&qp=Chicago,%20Illinois,%20United%20States&l=en' -H 'Accept: application/json' ```

All 539 locations returned with single query to: `https://www.maxi.rs/api/v1/?operationName=GetStoreSearch&variables={"pageSize":3000,"lang":"sr","query":"","currentPage":0,"options":"STORELOCATOR_MINIFIED"}&extensions={"persistedQuery":{"version":1,"sha256Hash":"9dc67fed7b358c14d80bbd04c6524ef76f4298a142ed7ab86732442271f4ad46"}}` GraphQL is used under the hood and the server appears to be Apollo GraphQL. They're using a persisted query that seems...

Has bot protection with `Server: rhino-core-shield` in the response. Not much online confirming what this product is.

> I found it difficult to start a group of related crawlers for a website, ie it has 3x urls to look at; so I wanted to crawl all of...

> ``` > # for store in stores > process = CrawlerProcess(settings) > .... > > # for url in urls, make a crawler > crawler = process.create_crawler(StorefinderDetectorSpider) > crawler.signals.connect(self.print_spider_code,...

Just to check--is the difference from the `autospidergen` branch just the changes to `nsi.py` and `WIKIDATA.md`?