gallery-dl icon indicating copy to clipboard operation
gallery-dl copied to clipboard

Questions, Feedback and Suggestions #3

Open mikf opened this issue 5 years ago • 258 comments

Continuation of the old issue as a central place for any sort of question or suggestion not deserving their own separate issue. There is also https://gitter.im/gallery-dl/main if that seems more appropriate.

Links to older issues: #11, #74

mikf avatar Jan 01 '19 15:01 mikf

simple snippet to turn gallery-dl into api

from types import SimpleNamespace
from unittest.mock import patch, Mock
import os

import click
from flask.cli import FlaskGroup
from flask import (
    Flask,
    jsonify,
    request,
)

from gallery_dl import main, option
from gallery_dl.job import DataJob

def get_json():
    data = None
    parser = option.build_parser()
    args = parser.parse_args()
    args.urls = request.args.getlist('url')
    if not args.urls:
        return jsonify({'error': 'No url(s)'})
    args.list_data = True

    class CustomClass:
        data = []

        def run(self):
            dj = DataJob(*self.data_job_args, **self.data_job_kwargs)
            dj.run()
            self.data.append({
                'args': self.data_job_args,
                "kwargs": self.data_job_kwargs,
                'data': dj.data
            })

        def DataJob(self, *args, **kwargs):
            self.data_job_args = args
            self.data_job_kwargs = kwargs
            retval = SimpleNamespace()
            retval.run = self.run
            return retval

    c1 = CustomClass()
    with patch('gallery_dl.option.build_parser') as m_bp, \
            patch('gallery_dl.job.DataJob', side_effect=c1.DataJob) as m_jt:
        #  m_option.return_value.parser_args.return_value = args
        m_bp.return_value.parse_args.return_value = args
        m_jt.__name__ = 'DataJob'
        main()
        data = c1.data
    return jsonify({'data': data, 'urls': args.urls})

def create_app(script_info=None):
    """create app."""
    app = Flask(__name__)
    app.add_url_rule(
        '/api/json', 'gallery_dl_json', get_json)
    return app


@click.group(cls=FlaskGroup, create_app=create_app)
def cli():
    """This is a script for application."""
    pass


if __name__ == '__main__':
    cli()

e: this could be simple when using direct DataJob to handle the urls, but i haven't check if there is anything have to be done before initialize DataJob instance

rachmadaniHaryono avatar Jan 01 '19 21:01 rachmadaniHaryono

this could be simple when using direct DataJob to handle the urls, but i haven't check if there is anything have to be done before initialize DataJob instance.

You don't need to do anything before initializing any of the Job classes:

>>> from gallery_dl import job
>>> j = job.DataJob("https://imgur.com/0gybAXR")
>>> j.run()
[ ... ]

You can initialize anything logging related if you want logging output, or call config.load() and config.set(...) if you want to load a config file and set some custom options, but none of that is necessary.

mikf avatar Jan 03 '19 16:01 mikf

@rachmadaniHaryono what does that code do?

DonaldTsang avatar Jan 08 '19 14:01 DonaldTsang

simpler api (based on above suggestion)

#!/usr/bin/env python
from types import SimpleNamespace
from unittest.mock import patch, Mock
import os

import click
from flask.cli import FlaskGroup
from flask import (
    Flask,
    jsonify,
    request,
)

from gallery_dl import main, option
from gallery_dl.job import DataJob
from gallery_dl.exception import NoExtractorError


def get_json():
    data = []
    parser = option.build_parser()
    args = parser.parse_args()
    args.urls = request.args.getlist('url')
    if not args.urls:
        return jsonify({'error': 'No url(s)'})
    args.list_data = True
    for url in args.urls:
        url_res = None
        error = None
        try:
            job = DataJob(url)
            job.run()
            url_res = job.data
        except NoExtractorError as err:
            error = err
        data_item = [url, url_res, {'error': str(error) if error else None}]
        data.append(data_item)
    return jsonify({'data': data, 'urls': args.urls})


def create_app(script_info=None):
    """create app."""
    app = Flask(__name__)
    app.add_url_rule(
        '/api/json', 'gallery_dl_json', get_json)
    return app


@click.group(cls=FlaskGroup, create_app=create_app)
def cli():
    """This is a script for application."""
    pass


if __name__ == '__main__':
    cli()

rachmadaniHaryono avatar Jan 08 '19 16:01 rachmadaniHaryono

gallery_dl_gug gug for hydrus (port 5013)

rachmadaniHaryono avatar Jan 08 '19 16:01 rachmadaniHaryono

@rachmadaniHaryono instructions on using this GUG and combing it with Hydrus? Any pre-configurstions besides pip3 install gallery-dl ?

DonaldTsang avatar Jan 08 '19 16:01 DonaldTsang

  • put that on script (e.g. script.py)
  • import gug into hydrus
  • pip3 install flask gallery-dl (add --user if needed)
  • run python3 script.py --port 5013

rachmadaniHaryono avatar Jan 08 '19 21:01 rachmadaniHaryono

@rachmadaniHaryono add that to the Wiki in https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts if you can, sounded like a really good solution. Also, why port 5013, is that port specifically used for something?

DonaldTsang avatar Jan 09 '19 13:01 DonaldTsang

Also, why port 5013, is that port specifically used for something

not a really technical reason. i just use it because the default port is used for my other program.

add that to the Wiki in CuddleBear92/Hydrus-Presets-and-Scripts if you can

i will consider it, because i'm not sure where to put that

another plan is fork (or create pr) for server command but i'm not sure if @mikf want pr for this

rachmadaniHaryono avatar Jan 09 '19 15:01 rachmadaniHaryono

@rachmadaniHaryono https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/wiki Also I would like @mikf to have a look at this, since this is pretty useful. BTW, what is the speed overhead of using this over having a separate txt file like the one in https://github.com/Bionus/imgbrd-grabber/issues/1492 ?

DonaldTsang avatar Jan 10 '19 03:01 DonaldTsang

BTW, what is the speed overhead of using this over having a separate txt file like the one in Bionus/imgbrd-grabber#1492 ?

this depend on hydrus vs imgbrd-grabber download speed. from my test gallery-dl give direct link, so hydrus don't have to process the link anymore.

rachmadaniHaryono avatar Jan 10 '19 09:01 rachmadaniHaryono

another plan is fork (or create pr) for server command but i'm not sure if @mikf want pr for this

I've already had something similar to this in mind (implementing a (local) server infrastructure to (remotely) send commands / queries: gallery-dl --server), so I would be quite in favor of adding functionality like this. But I'm not so happy about adding flask as a dependency, even if optional. I just generally dislike adding dependencies if they aren't absolutely necessary. I was thinking of using stuff from the http.server module in Python's standard library if possible. Also: the script you posted here should be simplified quite a bit further. For example there is no need to build an command line option parser. I'll see if I can get something to work on my own.

A few questions from me concerning Hydrus

  • The whole thing is written in Python, even version 3 since the last update. Isn't there a better way of coupling it with another Python module than a HTTP server? As in "is it possible to add a native "hook" to make it call another Python function"?
  • Is there any documentation for the request and response data formats Hydrus sends to and expects from GUG's? I've found this, but that doesn't really explain how Hydrus interacts with other things.

mikf avatar Jan 10 '19 12:01 mikf

But I'm not so happy about adding flask as a dependency, even if optional. I just generally dislike adding dependencies if they aren't absolutely necessary. I was thinking of using stuff from the http.server module in Python's standard library if possible.

this still depend on how big will this be. will it just be an api or there will be html interface for this. although an existing framework will make it easier and the plugin for the framework will let other developer create more feature they want.

of course there is more better framework than flask as example, e.g. sanic, django but i actually doubt if using the standard will be better than those.

Also: the script you posted here should be simplified quite a bit further. For example there is no need to build an command line option parser.

that is modified version from flask cli example. flask can do that simpler but it require to set up variable environment which add another command

The whole thing is written in Python, even version 3 since the last update. Isn't there a better way of coupling it with another Python module than a HTTP server? As in "is it possible to add a native "hook" to make it call another Python function"?

hydrus dev is planned to make api for this on the next milestone. there is also other hydrus user which make unofficial api but he didn't make one for download yet. so either wait for it or use existing hydrus parser

Is there any documentation for the request and response data formats Hydrus sends to and expects from GUG's? I've found this, but that doesn't really explain how Hydrus interacts with other things.

hydrus expect either html and json and try to extract data based on the parser the user made/import. i make this one for html but it maybe changed on future version https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/blob/master/guide/create_parser_furaffinity.md .

if someone want to make one, they can try made api similar to 4chan api,copy the structure and use modified parser from existing 4chan api.

my best recommendation is to try hydrus parser directly and see what option is there. ask hydrus discord channel if anything is unclear

rachmadaniHaryono avatar Jan 10 '19 13:01 rachmadaniHaryono

can gallery-dl support weibo ? i found this https://github.com/nondanee/weiboPicDownloader but it take too long to scan and dont have ability to skip downloaded files

wankio avatar Jan 11 '19 10:01 wankio

@rachmadaniHaryono I opened a new branch for API server related stuff. The first commit there implements the same functionality as your script, but without external dependencies. Go take a look at it if you want.

And when I said your script "should be simplified ... further" I didn't mean it should use less lines of code, but less resources in term of CPU and memory. Python might not be the right language to use when caring about things like that, but there is still no need to call functions that effectively do nothing - command-line argument parsing for example.

mikf avatar Jan 13 '19 09:01 mikf

will it be only api or will there will be html interface @mikf?

e: i will comment the code on the commit

rachmadaniHaryono avatar Jan 13 '19 14:01 rachmadaniHaryono

I don't think there should be an HTML interface directly inside of gallery-dl. I would prefer it to have a separate front-end (HTML or whatever) communicating with the API back-end that's baked into gallery-dl itself. It is a more general approach and would allow for any programing language and framework to more easily interact with gallery-dl, not just Python.

mikf avatar Jan 13 '19 15:01 mikf

gallery_dl_gug

  • based on https://github.com/mikf/gallery-dl/commit/8662e72bdd80f0158c5d73cccc1d1777f5fbaf33
  • album.title is now parsed as album tag
  • source url and download url are minimum 2 character (fix host:port/api/json/1 error)
  • description is not None or none

still on port 5013

e: related issue https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/issues/69

rachmadaniHaryono avatar Jan 13 '19 17:01 rachmadaniHaryono

About twitter extractor, we have limited request depend on how many tweets user had right ? if user have over 2k+ media, 99% it can't download full media

wankio avatar Feb 01 '19 16:02 wankio

@wankio The Twitter extractor gets the same tweets you would get by visiting a timeline in your browser and scrolling down until no more tweets get dynamically loaded. I don't know how many tweets you can access like that, but Twitter's public API has a similar restriction::

https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline.html

This method can only return up to 3,200 of a user's most recent Tweets. Native retweets of other statuses by the user is included in this total, regardless of whether include_rts is set to false when requesting this resource.

You could try ripme. It uses the public API instead of a "hidden", browser-only API like gallery-dl. Maybe you can get more results with that.

mikf avatar Feb 02 '19 14:02 mikf

but if i remember, ripme rip all tweet/retweet not just user tweet

wankio avatar Feb 03 '19 15:02 wankio

For some reason the login with OAuth and App Garden tokens or the -u/-p commands doesn't work with flickr which makes images that require a login to view them not downloadable. But otherwise amazing tool, thank you so much!

schleifen avatar Feb 07 '19 21:02 schleifen

today when i'm checking e-hentai/exhentai, it just stucked forever. maybe my ISP is the problem because i can't access e-hentai but exhentai still ok. So i think Oauth should help, using cookies instead id+password to bypass

wankio avatar Feb 24 '19 14:02 wankio

is there a way to download files directly in a specified folder instread of subfolders? for exemple for the picture to be downloaded in F:\Downloaded\ i tried using gallery-dl -d "F:\Downloaded" https://imgur.com/a/xcEl2WW but instead they get downloaded to F:\Downloaded\imgur\xcEl2WW - Inklings is there an argument i could add to the command to fix that?

ghost avatar Apr 10 '19 14:04 ghost

@Mattlau04 Short answer: set extractor.directory to an empty string: -o directory=""

Long answer: The path for downloaded files is build from three components:

  • a static base-directory (that's what you set with -d/--dest)
  • directory: a list of format strings; one for each path segment
  • filename: another format string

You can configure all three of them to fit your needs in your config file, but specifying a format string on the command-line can be rather cumbersome and has therefore no extra command-line argument.
You can however use -o/--option to set any option value and removing the dynamic directory part should do what you want.

mikf avatar Apr 10 '19 19:04 mikf

thanks a lot for the help!

ghost avatar Apr 10 '19 20:04 ghost

Huh sorry to ask so much stuff in so little time, but in a batchfile, i have this command : gallery-dl -o directory="" -o filename="{id}_{tags}" -d "%~dp0\gallery-dl\images\hypnohub" https://hypnohub.net/post?tags=splatoon and it download the first 4 files fine but then it give me OSError: [Errno 22] Invalid argument here is the verbose:

Output
[gallery-dl][debug] Version 1.8.2-dev
[gallery-dl][debug] Python 3.6.7 - Windows-10-10.0.17134-SP0
[gallery-dl][debug] requests 2.20.1 - urllib3 1.24.1
[gallery-dl][debug] Starting DownloadJob for 'https://hypnohub.net/post?tags=splatoon'
[hypnohub][debug] Using HypnohubTagExtractor for 'https://hypnohub.net/post?tags=splatoon'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): hypnohub.net:443
[urllib3.connectionpool][debug] https://hypnohub.net:443 "GET /post.json?tags=splatoon&limit=50&page=1 HTTP/1.1" 200 None
# F:\Auto upload full splatoon doujin colection\\gallery-...l_eyes splatoon symbol_in_eyes taka-michi topless towel wet
# F:\Auto upload full splatoon doujin colection\\gallery-... nintendo splatoon tech_control tentacles tongue tongue_out
# F:\Auto upload full splatoon doujin colection\\gallery-...ndo splatoon tech_control tentacles tongue tongue_out visor
# F:\Auto upload full splatoon doujin colection\\gallery-... nintendo splatoon tech_control tentacles tongue tongue_out
# F:\Auto upload full splatoon doujin colection\\gallery-... nintendo splatoon tech_control tentacles tongue tongue_out
[urllib3.connectionpool][debug] https://hypnohub.net:443 "GET //data/image/b30b984c7e231cd2ad5d55aaa533cad6.jpg HTTP/1.1" 200 137174
  F:\Auto upload full splatoon doujin colection\\gallery-...ch_control tentacles thighhighs tongue tongue_out underwear
[hypnohub][error] Unable to download data:  OSError: [Errno 22] Invalid argument: '\\\\?\\F:\\Auto upload full splatoon doujin colection\\gallery-dl\\images\\hypnohub\\77610_ahegao blush bottomless breasts breasts_outside callie_(splatoon) civibes cum cum_in_pussy dazed earrings elf_ears empty_eyes female_only femsub gloves hypnotic_accessory large_breasts lying mole nintendo open_clothes open_mouth panties pussy shirt_lift splatoon splatoon_2 spread_legs sunglasses sweat tank_top tech_control tentacles thighhighs tongue tongue_out underwear.part'
[hypnohub][debug]
Traceback (most recent call last):
  File "c:\users\mattl\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 55, in run
    self.dispatch(msg)
  File "c:\users\mattl\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 99, in dispatch
    self.handle_url(url, kwds)
  File "c:\users\mattl\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 210, in handle_url
    if not self.download(url):
  File "c:\users\mattl\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 279, in download
    return downloader.download(url, self.pathfmt)
  File "c:\users\mattl\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\downloader\common.py", line 43, in download
    return self.download_impl(url, pathfmt)
  File "c:\users\mattl\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\downloader\common.py", line 106, in download_impl
    with pathfmt.open(mode) as file:
  File "c:\users\mattl\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\util.py", line 509, in open
    return open(self.temppath, mode)
OSError: [Errno 22] Invalid argument: '\\\\?\\F:\\Auto upload full splatoon doujin colection\\gallery-dl\\images\\hypnohub\\77610_ahegao blush bottomless breasts breasts_outside callie_(splatoon) civibes cum cum_in_pussy dazed earrings elf_ears empty_eyes female_only femsub gloves hypnotic_accessory large_breasts lying mole nintendo open_clothes open_mouth panties pussy shirt_lift splatoon splatoon_2 spread_legs sunglasses sweat tank_top tech_control tentacles thighhighs tongue tongue_out underwear.part'

ghost avatar Apr 10 '19 21:04 ghost

There are too many tags and the filename got too long (> 255 bytes).

You can shorten the tags string to for example 200 characters with {tags[:200]}, or you use {tags:L200/too many tags/} to replace the content of {tags} with too many tags if it exceeds 200 characters.

You should also consider using a config file. It's a lot more readable than packing everything into command-line arguments.

mikf avatar Apr 10 '19 21:04 mikf

is there no way to remove the 255 bytes limit?

ghost avatar Apr 11 '19 13:04 ghost

No, there isn't. This is an inherent limitation of most filesystems (see Comparison of file systems (*)).

Instead of saving an image's tags in its filename, you could store it in a separate file with --write-tags.

(*) NTFS has a limit of 255 UTF-16 code units, not bytes, but that doesn't make much of a difference here.

mikf avatar Apr 11 '19 15:04 mikf