CommunityScrapers Adding initial support for a wikidata based scraper.

Adding initial support for a wikidata based scraper.

Open Tweeticoats opened this issue 3 years ago • 2 comments

wikidata is the database behind wikipedia and can be used to power info boxes on wikipedia pages. Fixes #103 This database can have a lot of useful info such as height, weight, country, twitter handles etc.

Currently this is a WIP. I might rewrite the processing performer with a python script as entries are easier to process.

Aug 02 '21 02:08 Tweeticoats

The search doesnt seem to work for me. No results even though the performer is in the site eg https://www.wikidata.org/wiki/Q233092
Isnt it easier to use something like https://www.wikidata.org/wiki/Q for the URL and do a URL Replace ? Or is the https://www.wikidata.org/wiki/Special:EntityData/Q accessible from somewhere in the site?
Values with a quantity,time type (weight,height,dob,..) seem to have a + that needs to be stripped
Date needs to be postprocessed
Aliases can also be fetched, something like

      Aliases:
        selector: entities.*.aliases.en.#.value
        concat: ", "

Aug 17 '21 22:08 bnkai

https://www.wikidata.org/wiki/Q233092 does not work in search as they do not have the occupation listed. The query does a search for humans with an occupation of "pornographic actor" Q488111 so someone needs to edit the entry and add the flag for it to show up in the search results.

Doing a url replace is a good idea, replacing Q with Special:EntityData/Q gives you json to parse.

I'll take a look at stripping + and parsing dates.

Aug 27 '21 00:08 Tweeticoats

CommunityScrapers CommunityScrapers copied to clipboard

Adding initial support for a wikidata based scraper.

CommunityScrapers
CommunityScrapers copied to clipboard