CommunityScrapers icon indicating copy to clipboard operation
CommunityScrapers copied to clipboard

Adding initial support for a wikidata based scraper.

Open Tweeticoats opened this issue 3 years ago • 2 comments

wikidata is the database behind wikipedia and can be used to power info boxes on wikipedia pages. Fixes #103 This database can have a lot of useful info such as height, weight, country, twitter handles etc.

Currently this is a WIP. I might rewrite the processing performer with a python script as entries are easier to process.

Tweeticoats avatar Aug 02 '21 02:08 Tweeticoats

  • The search doesnt seem to work for me. No results even though the performer is in the site eg https://www.wikidata.org/wiki/Q233092

  • Isnt it easier to use something like https://www.wikidata.org/wiki/Q for the URL and do a URL Replace ? Or is the https://www.wikidata.org/wiki/Special:EntityData/Q accessible from somewhere in the site?

  • Values with a quantity,time type (weight,height,dob,..) seem to have a + that needs to be stripped

  • Date needs to be postprocessed

  • Aliases can also be fetched, something like

      Aliases:
        selector: entities.*.aliases.en.#.value
        concat: ", "

?

bnkai avatar Aug 17 '21 22:08 bnkai

https://www.wikidata.org/wiki/Q233092 does not work in search as they do not have the occupation listed. The query does a search for humans with an occupation of "pornographic actor" Q488111 so someone needs to edit the entry and add the flag for it to show up in the search results.

Doing a url replace is a good idea, replacing Q with Special:EntityData/Q gives you json to parse.

I'll take a look at stripping + and parsing dates.

Tweeticoats avatar Aug 27 '21 00:08 Tweeticoats