Cj Malone
Cj Malone
> I am not sure that I understand the relevance of brand:wikidata. > Filtering out `brand:wikidata` is based on the assumption that they are more likely chain stores, where the...
So we now add `nsi_id` automatically to most spiders, and the coverage will increase as get better at matching in `ApplyNSICategoriesPipeline`. We also apply the NSI tags (OSM categories) from...
I don't have a number off the top of my head, we could look at the current values and make a better decision. It may be different per language. Is...
[Q29261993](https://www.wikidata.org/wiki/Q29261993), it's always `^Q\d+$`
I imagine it's quite patchy, more popular ones will have more data. But I picked that one by chance, so hopefully it's a good reference. Also, when I created this...
Honestly, I want this to store the source. Magnet links. Yes, key/value would be great. That way I could add `reddit=https://www.reddit.com/r/westworld/` and `justwatch=https://www.justwatch.com/uk/tv-series/westworld`
Well it means their isn't an issue with git, or package managers when changing the config so it is improving ease of use. If you don't want to merge it...
@iandees I think there are 2 options right now: 1. We either include it with `requires_proxy` and hopefully solve it at the same time, although they aren't exactly the same...
[log](https://data.alltheplaces.xyz/ci/4036387043/currys/log.txt) [map](https://data.alltheplaces.xyz/map.html?show=https://data.alltheplaces.xyz/ci/4036387043/currys/output.geojson) I think the errors are just about [bad urls](https://www.currys.co.uk/page?cid=leeds-crown-point-2267) in the sitemap, not actual errors. I definitely recommend testing Docker before the next weekly. ```python {'atp/brand/currys': 293, 'atp/brand_wikidata/Q3246464':...
CI is now failing because it thinks there is more than 5 spiders changed. Other than the docker thing, I think this is good to go.