gallery-dl
gallery-dl copied to clipboard
Questions, Feedback and Suggestions #3
Continuation of the old issue as a central place for any sort of question or suggestion not deserving their own separate issue. There is also https://gitter.im/gallery-dl/main if that seems more appropriate.
Links to older issues: #11, #74
simple snippet to turn gallery-dl into api
from types import SimpleNamespace
from unittest.mock import patch, Mock
import os
import click
from flask.cli import FlaskGroup
from flask import (
Flask,
jsonify,
request,
)
from gallery_dl import main, option
from gallery_dl.job import DataJob
def get_json():
data = None
parser = option.build_parser()
args = parser.parse_args()
args.urls = request.args.getlist('url')
if not args.urls:
return jsonify({'error': 'No url(s)'})
args.list_data = True
class CustomClass:
data = []
def run(self):
dj = DataJob(*self.data_job_args, **self.data_job_kwargs)
dj.run()
self.data.append({
'args': self.data_job_args,
"kwargs": self.data_job_kwargs,
'data': dj.data
})
def DataJob(self, *args, **kwargs):
self.data_job_args = args
self.data_job_kwargs = kwargs
retval = SimpleNamespace()
retval.run = self.run
return retval
c1 = CustomClass()
with patch('gallery_dl.option.build_parser') as m_bp, \
patch('gallery_dl.job.DataJob', side_effect=c1.DataJob) as m_jt:
# m_option.return_value.parser_args.return_value = args
m_bp.return_value.parse_args.return_value = args
m_jt.__name__ = 'DataJob'
main()
data = c1.data
return jsonify({'data': data, 'urls': args.urls})
def create_app(script_info=None):
"""create app."""
app = Flask(__name__)
app.add_url_rule(
'/api/json', 'gallery_dl_json', get_json)
return app
@click.group(cls=FlaskGroup, create_app=create_app)
def cli():
"""This is a script for application."""
pass
if __name__ == '__main__':
cli()
e: this could be simple when using direct DataJob to handle the urls, but i haven't check if there is anything have to be done before initialize DataJob instance
this could be simple when using direct DataJob to handle the urls, but i haven't check if there is anything have to be done before initialize DataJob instance.
You don't need to do anything before initializing any of the Job classes:
>>> from gallery_dl import job
>>> j = job.DataJob("https://imgur.com/0gybAXR")
>>> j.run()
[ ... ]
You can initialize anything logging related if you want logging output,
or call config.load()
and config.set(...)
if you want to load a config file and set some custom options,
but none of that is necessary.
@rachmadaniHaryono what does that code do?
simpler api (based on above suggestion)
#!/usr/bin/env python
from types import SimpleNamespace
from unittest.mock import patch, Mock
import os
import click
from flask.cli import FlaskGroup
from flask import (
Flask,
jsonify,
request,
)
from gallery_dl import main, option
from gallery_dl.job import DataJob
from gallery_dl.exception import NoExtractorError
def get_json():
data = []
parser = option.build_parser()
args = parser.parse_args()
args.urls = request.args.getlist('url')
if not args.urls:
return jsonify({'error': 'No url(s)'})
args.list_data = True
for url in args.urls:
url_res = None
error = None
try:
job = DataJob(url)
job.run()
url_res = job.data
except NoExtractorError as err:
error = err
data_item = [url, url_res, {'error': str(error) if error else None}]
data.append(data_item)
return jsonify({'data': data, 'urls': args.urls})
def create_app(script_info=None):
"""create app."""
app = Flask(__name__)
app.add_url_rule(
'/api/json', 'gallery_dl_json', get_json)
return app
@click.group(cls=FlaskGroup, create_app=create_app)
def cli():
"""This is a script for application."""
pass
if __name__ == '__main__':
cli()
gug for hydrus (port 5013)
@rachmadaniHaryono instructions on using this GUG and combing it with Hydrus? Any pre-configurstions besides pip3 install gallery-dl
?
- put that on script (e.g.
script.py
) - import gug into hydrus
-
pip3 install flask gallery-dl
(add--user
if needed) - run
python3 script.py --port 5013
@rachmadaniHaryono add that to the Wiki in https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts if you can, sounded like a really good solution. Also, why port 5013, is that port specifically used for something?
Also, why port 5013, is that port specifically used for something
not a really technical reason. i just use it because the default port is used for my other program.
add that to the Wiki in CuddleBear92/Hydrus-Presets-and-Scripts if you can
i will consider it, because i'm not sure where to put that
another plan is fork (or create pr) for server command but i'm not sure if @mikf want pr for this
@rachmadaniHaryono https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/wiki Also I would like @mikf to have a look at this, since this is pretty useful. BTW, what is the speed overhead of using this over having a separate txt file like the one in https://github.com/Bionus/imgbrd-grabber/issues/1492 ?
BTW, what is the speed overhead of using this over having a separate txt file like the one in Bionus/imgbrd-grabber#1492 ?
this depend on hydrus vs imgbrd-grabber download speed. from my test gallery-dl give direct link, so hydrus don't have to process the link anymore.
another plan is fork (or create pr) for server command but i'm not sure if @mikf want pr for this
I've already had something similar to this in mind (implementing a (local) server infrastructure to (remotely) send commands / queries: gallery-dl --server
), so I would be quite in favor of adding functionality like this.
But I'm not so happy about adding flask
as a dependency, even if optional. I just generally dislike adding dependencies if they aren't absolutely necessary. I was thinking of using stuff from the http.server
module in Python's standard library if possible.
Also: the script you posted here should be simplified quite a bit further. For example there is no need to build an command line option parser. I'll see if I can get something to work on my own.
A few questions from me concerning Hydrus
- The whole thing is written in Python, even version 3 since the last update. Isn't there a better way of coupling it with another Python module than a HTTP server? As in "is it possible to add a native "hook" to make it call another Python function"?
- Is there any documentation for the request and response data formats Hydrus sends to and expects from GUG's? I've found this, but that doesn't really explain how Hydrus interacts with other things.
But I'm not so happy about adding flask as a dependency, even if optional. I just generally dislike adding dependencies if they aren't absolutely necessary. I was thinking of using stuff from the http.server module in Python's standard library if possible.
this still depend on how big will this be. will it just be an api or there will be html interface for this. although an existing framework will make it easier and the plugin for the framework will let other developer create more feature they want.
of course there is more better framework than flask as example, e.g. sanic, django but i actually doubt if using the standard will be better than those.
Also: the script you posted here should be simplified quite a bit further. For example there is no need to build an command line option parser.
that is modified version from flask cli example. flask can do that simpler but it require to set up variable environment which add another command
The whole thing is written in Python, even version 3 since the last update. Isn't there a better way of coupling it with another Python module than a HTTP server? As in "is it possible to add a native "hook" to make it call another Python function"?
hydrus dev is planned to make api for this on the next milestone. there is also other hydrus user which make unofficial api but he didn't make one for download yet. so either wait for it or use existing hydrus parser
Is there any documentation for the request and response data formats Hydrus sends to and expects from GUG's? I've found this, but that doesn't really explain how Hydrus interacts with other things.
hydrus expect either html and json and try to extract data based on the parser the user made/import. i make this one for html but it maybe changed on future version https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/blob/master/guide/create_parser_furaffinity.md .
if someone want to make one, they can try made api similar to 4chan api,copy the structure and use modified parser from existing 4chan api.
my best recommendation is to try hydrus parser directly and see what option is there. ask hydrus discord channel if anything is unclear
can gallery-dl support weibo ? i found this https://github.com/nondanee/weiboPicDownloader but it take too long to scan and dont have ability to skip downloaded files
@rachmadaniHaryono I opened a new branch for API server related stuff. The first commit there implements the same functionality as your script, but without external dependencies. Go take a look at it if you want.
And when I said your script "should be simplified ... further" I didn't mean it should use less lines of code, but less resources in term of CPU and memory. Python might not be the right language to use when caring about things like that, but there is still no need to call functions that effectively do nothing - command-line argument parsing for example.
will it be only api or will there will be html interface @mikf?
e: i will comment the code on the commit
I don't think there should be an HTML interface directly inside of gallery-dl. I would prefer it to have a separate front-end (HTML or whatever) communicating with the API back-end that's baked into gallery-dl itself. It is a more general approach and would allow for any programing language and framework to more easily interact with gallery-dl, not just Python.
- based on https://github.com/mikf/gallery-dl/commit/8662e72bdd80f0158c5d73cccc1d1777f5fbaf33
- album.title is now parsed as
album
tag - source url and download url are minimum 2 character (fix
host:port/api/json/1
error) -
description
is notNone
ornone
still on port 5013
e: related issue https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/issues/69
About twitter extractor, we have limited request depend on how many tweets user had right ? if user have over 2k+ media, 99% it can't download full media
@wankio The Twitter extractor gets the same tweets you would get by visiting a timeline in your browser and scrolling down until no more tweets get dynamically loaded. I don't know how many tweets you can access like that, but Twitter's public API has a similar restriction::
https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline.html
This method can only return up to 3,200 of a user's most recent Tweets. Native retweets of other statuses by the user is included in this total, regardless of whether include_rts is set to false when requesting this resource.
You could try ripme. It uses the public API instead of a "hidden", browser-only API like gallery-dl. Maybe you can get more results with that.
but if i remember, ripme rip all tweet/retweet not just user tweet
For some reason the login with OAuth and App Garden tokens or the -u/-p commands doesn't work with flickr which makes images that require a login to view them not downloadable. But otherwise amazing tool, thank you so much!
today when i'm checking e-hentai/exhentai, it just stucked forever. maybe my ISP is the problem because i can't access e-hentai but exhentai still ok. So i think Oauth should help, using cookies instead id+password to bypass
is there a way to download files directly in a specified folder instread of subfolders? for exemple for the picture to be downloaded in F:\Downloaded\ i tried using gallery-dl -d "F:\Downloaded" https://imgur.com/a/xcEl2WW but instead they get downloaded to F:\Downloaded\imgur\xcEl2WW - Inklings is there an argument i could add to the command to fix that?
@Mattlau04
Short answer: set extractor.directory to an empty string: -o directory=""
Long answer: The path for downloaded files is build from three components:
- a static
base-directory
(that's what you set with-d/--dest
) -
directory
: a list of format strings; one for each path segment -
filename
: another format string
You can configure all three of them to fit your needs in your config file, but specifying a format string on the command-line can be rather cumbersome and has therefore no extra command-line argument.
You can however use -o/--option
to set any option value and removing the dynamic directory
part should do what you want.
thanks a lot for the help!
Huh sorry to ask so much stuff in so little time, but in a batchfile, i have this command : gallery-dl -o directory="" -o filename="{id}_{tags}" -d "%~dp0\gallery-dl\images\hypnohub" https://hypnohub.net/post?tags=splatoon and it download the first 4 files fine but then it give me OSError: [Errno 22] Invalid argument here is the verbose:
Output
[gallery-dl][debug] Version 1.8.2-dev [gallery-dl][debug] Python 3.6.7 - Windows-10-10.0.17134-SP0 [gallery-dl][debug] requests 2.20.1 - urllib3 1.24.1 [gallery-dl][debug] Starting DownloadJob for 'https://hypnohub.net/post?tags=splatoon' [hypnohub][debug] Using HypnohubTagExtractor for 'https://hypnohub.net/post?tags=splatoon' [urllib3.connectionpool][debug] Starting new HTTPS connection (1): hypnohub.net:443 [urllib3.connectionpool][debug] https://hypnohub.net:443 "GET /post.json?tags=splatoon&limit=50&page=1 HTTP/1.1" 200 None # F:\Auto upload full splatoon doujin colection\\gallery-...l_eyes splatoon symbol_in_eyes taka-michi topless towel wet # F:\Auto upload full splatoon doujin colection\\gallery-... nintendo splatoon tech_control tentacles tongue tongue_out # F:\Auto upload full splatoon doujin colection\\gallery-...ndo splatoon tech_control tentacles tongue tongue_out visor # F:\Auto upload full splatoon doujin colection\\gallery-... nintendo splatoon tech_control tentacles tongue tongue_out # F:\Auto upload full splatoon doujin colection\\gallery-... nintendo splatoon tech_control tentacles tongue tongue_out [urllib3.connectionpool][debug] https://hypnohub.net:443 "GET //data/image/b30b984c7e231cd2ad5d55aaa533cad6.jpg HTTP/1.1" 200 137174 F:\Auto upload full splatoon doujin colection\\gallery-...ch_control tentacles thighhighs tongue tongue_out underwear [hypnohub][error] Unable to download data: OSError: [Errno 22] Invalid argument: '\\\\?\\F:\\Auto upload full splatoon doujin colection\\gallery-dl\\images\\hypnohub\\77610_ahegao blush bottomless breasts breasts_outside callie_(splatoon) civibes cum cum_in_pussy dazed earrings elf_ears empty_eyes female_only femsub gloves hypnotic_accessory large_breasts lying mole nintendo open_clothes open_mouth panties pussy shirt_lift splatoon splatoon_2 spread_legs sunglasses sweat tank_top tech_control tentacles thighhighs tongue tongue_out underwear.part' [hypnohub][debug] Traceback (most recent call last): File "c:\users\mattl\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 55, in run self.dispatch(msg) File "c:\users\mattl\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 99, in dispatch self.handle_url(url, kwds) File "c:\users\mattl\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 210, in handle_url if not self.download(url): File "c:\users\mattl\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 279, in download return downloader.download(url, self.pathfmt) File "c:\users\mattl\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\downloader\common.py", line 43, in download return self.download_impl(url, pathfmt) File "c:\users\mattl\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\downloader\common.py", line 106, in download_impl with pathfmt.open(mode) as file: File "c:\users\mattl\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\util.py", line 509, in open return open(self.temppath, mode) OSError: [Errno 22] Invalid argument: '\\\\?\\F:\\Auto upload full splatoon doujin colection\\gallery-dl\\images\\hypnohub\\77610_ahegao blush bottomless breasts breasts_outside callie_(splatoon) civibes cum cum_in_pussy dazed earrings elf_ears empty_eyes female_only femsub gloves hypnotic_accessory large_breasts lying mole nintendo open_clothes open_mouth panties pussy shirt_lift splatoon splatoon_2 spread_legs sunglasses sweat tank_top tech_control tentacles thighhighs tongue tongue_out underwear.part'
There are too many tags and the filename got too long (> 255 bytes).
You can shorten the tags string to for example 200 characters with {tags[:200]}
,
or you use {tags:L200/too many tags/}
to replace the content of {tags}
with too many tags
if it exceeds 200 characters.
You should also consider using a config file. It's a lot more readable than packing everything into command-line arguments.
is there no way to remove the 255 bytes limit?
No, there isn't. This is an inherent limitation of most filesystems (see Comparison of file systems (*)).
Instead of saving an image's tags in its filename, you could store it in a separate file with --write-tags
.
(*) NTFS has a limit of 255 UTF-16 code units, not bytes, but that doesn't make much of a difference here.