gallery-dl icon indicating copy to clipboard operation
gallery-dl copied to clipboard

[Site Support Request] Wikipedia and Wikimedia

Open paulolimac opened this issue 3 years ago • 6 comments

Is there any way to download from Wikipedia and Wikimedia domains? Unsuccessfully, my commands:

$ gallery-dl https://commons.wikimedia.org/wiki/Category:1st_Horseman_of_the_Apocalypse
[gallery-dl][error] No suitable extractor found for 'https://commons.wikimedia.org/wiki/Category:1st_Horseman_of_the_Apocalypse'

$ gallery-dl https://en.wikipedia.org/wiki/Gustave_Dor%C3%A9
[gallery-dl][error] No suitable extractor found for 'https://en.wikipedia.org/wiki/Gustave_Dor%C3%A9'

paulolimac avatar Apr 08 '21 21:04 paulolimac

Not at the moment.

mikf avatar Apr 08 '21 22:04 mikf

ok then. thanks for reply :)

paulolimac avatar Apr 08 '21 23:04 paulolimac

After looking about it a bit, Wikipedia (and any Mediawiki website, in general) has an API that can be used to retrieve images from an article (and surely other pages)

An example:

  1. https://en.wikipedia.org/w/api.php?action=parse&page=Pet_door&prop=images&format=json to retrieve all image names from an article
  2. https://en.wikipedia.org/w/api.php?action=query&titles=File:Gatera_de_ademuz.jpg&prop=imageinfo&iiprop=url to retrieve the full URL for an image name (since the exact path can change depending on the language version)

I guess I could try to implement an extractor if I someday find the time for it 0:)

Ailothaen avatar May 11 '21 19:05 Ailothaen

I wonder if there's a public out-of-source-code info on the Mediawiki URL syntax ... I couldn't find with an extremely fast try, and don't feel like checking the source code.

At 1st I was thinking "Match until a question mark after /wiki/" cuz I knew Mediawiki supports sub-articles which show up as /wiki/ORIGINAL_ARTICLE/SUB_ARTICLE (repeating the /SUB_ARTICLE part), but then I started thinking maybe matching until a question mark would exclude some articles.

rautamiekka avatar May 11 '21 19:05 rautamiekka

Random question for @mikf (it is slightly related to this issue, but I do not see any better place to post it): is there a documentation that specifies how to write an extractor? By that, I mean how to use the Extractor class and which methods are to be used depending on context.

Ailothaen avatar May 30 '21 18:05 Ailothaen

I have done this in my own repository: download.py. I think you may get inspired.

GrimPixel avatar Feb 05 '24 17:02 GrimPixel