gallery-dl
gallery-dl copied to clipboard
[Site Support Request] Wikipedia and Wikimedia
Is there any way to download from Wikipedia and Wikimedia domains? Unsuccessfully, my commands:
$ gallery-dl https://commons.wikimedia.org/wiki/Category:1st_Horseman_of_the_Apocalypse
[gallery-dl][error] No suitable extractor found for 'https://commons.wikimedia.org/wiki/Category:1st_Horseman_of_the_Apocalypse'
$ gallery-dl https://en.wikipedia.org/wiki/Gustave_Dor%C3%A9
[gallery-dl][error] No suitable extractor found for 'https://en.wikipedia.org/wiki/Gustave_Dor%C3%A9'
Not at the moment.
ok then. thanks for reply :)
After looking about it a bit, Wikipedia (and any Mediawiki website, in general) has an API that can be used to retrieve images from an article (and surely other pages)
An example:
- https://en.wikipedia.org/w/api.php?action=parse&page=Pet_door&prop=images&format=json to retrieve all image names from an article
- https://en.wikipedia.org/w/api.php?action=query&titles=File:Gatera_de_ademuz.jpg&prop=imageinfo&iiprop=url to retrieve the full URL for an image name (since the exact path can change depending on the language version)
I guess I could try to implement an extractor if I someday find the time for it 0:)
I wonder if there's a public out-of-source-code info on the Mediawiki URL syntax ... I couldn't find with an extremely fast try, and don't feel like checking the source code.
At 1st I was thinking "Match until a question mark after /wiki/
" cuz I knew Mediawiki supports sub-articles which show up as /wiki/ORIGINAL_ARTICLE/SUB_ARTICLE
(repeating the /SUB_ARTICLE
part), but then I started thinking maybe matching until a question mark would exclude some articles.
Random question for @mikf (it is slightly related to this issue, but I do not see any better place to post it): is there a documentation that specifies how to write an extractor? By that, I mean how to use the Extractor
class and which methods are to be used depending on context.
I have done this in my own repository: download.py. I think you may get inspired.