gallery-dl icon indicating copy to clipboard operation
gallery-dl copied to clipboard

Fanbox gallery images downloading out of order (alphabetical hash)

Open b51de opened this issue 2 years ago • 3 comments

I put the proper command in and it downloads the contents fine, but incorrectly in 1-9,A-Z alphabetical order of the Fanbox-side filenames which are all hashes, outputting totally out of order as [xxxxxxx]_1 and so on (the thumbnails come out as _0 just fine). I'll just include what I can and hope the solution is some totally simple filename option I've overlooked like an idiot. Sorry if the formatting isn't what it should look like. gallery-dl -v "https://www.fanbox.cc/@foobar/posts/!@#$%^" --cookies C:\Users\xxx\Desktop\cookies.txt

[gallery-dl][debug] Version 1.20.4 - Executable [gallery-dl][debug] Python 3.7.9 - Windows-10-10.0.18362 [gallery-dl][debug] requests 2.27.1 - urllib3 1.26.8 [gallery-dl][debug] Starting DownloadJob for 'https://www.fanbox.cc/@foobar/posts/!@#$%^' [fanbox][debug] Using FanboxPostExtractor for 'https://www.fanbox.cc/@foobar/posts/!@#$%^' [urllib3.connectionpool][debug] Starting new HTTPS connection (1): api.fanbox.cc:443 [urllib3.connectionpool][debug] https://api.fanbox.cc:443 "GET /post.info?postId=!@#$%^ HTTP/1.1" 200 3567 [urllib3.connectionpool][debug] Starting new HTTPS connection (1): pixiv.pximg.net:443 [urllib3.connectionpool][debug] https://pixiv.pximg.net:443 "GET /c/1200x630_90_a2_g5/fanbox/public/images/post/!@#$%^/cover/dI89euiV6NES3s9I4ZCXNYRC.jpeg HTTP/1.1" 200 364868

  • .\gallery-dl\fanbox\foobar!@#$%^_0.jpg [urllib3.connectionpool][debug] Starting new HTTPS connection (1): downloads.fanbox.cc:443 [urllib3.connectionpool][debug] https://downloads.fanbox.cc:443 "GET /images/post/1833133/140KfbnjlFMU5jAe7IgxqHKG.jpeg HTTP/1.1" 200 None
  • .\gallery-dl\fanbox\foobar!@#$%^_1.jpg [urllib3.connectionpool][debug] https://downloads.fanbox.cc:443 "GET /images/post/1833133/31qnHq0LNGqWEUMHB7XGE56d.jpeg HTTP/1.1" 200 None
  • .\gallery-dl\fanbox\foobar!@#$%^_2.jpg {and so on. again, it first downloaded the images with hash filenames starting with 1 and then 3}

Netscape HTTP Cookie File

.fanbox.cc TRUE / TRUE 1659054592 FANBOXSESSID {omit} .fanbox.cc TRUE / FALSE 1719534064 _ga GA1.2.1664558252.1656429579 .fanbox.cc TRUE / FALSE 1656462128 _gat_gtag_UA_1830249_145 1 .fanbox.cc TRUE / FALSE 1664205577 _gcl_au 1.1.1001063367.1656429577 .fanbox.cc TRUE / FALSE 1656548464 _gid GA1.2.1550279713.1656429579 .fanbox.cc TRUE / TRUE 1814109576 p_ab_d_id 141724759 .fanbox.cc TRUE / TRUE 1814109576 p_ab_id 9 .fanbox.cc TRUE / TRUE 1814109576 p_ab_id_2 0 .fanbox.cc TRUE / TRUE 1719501844 privacy_policy_agreement 3 .fanbox.cc TRUE / TRUE 1719534592 privacy_policy_notification 0 `

b51de avatar Jun 29 '22 15:06 b51de

in 1-9,A-Z alphabetical order

That's just a coincidence. Take https://www.fanbox.cc/@xub/posts/1910054 (NSFW) as a counter example. All files here have the same order as on fanbox itself, and the file hashes are in reverse order.

[urllib3.connectionpool][debug] https://downloads.fanbox.cc:443 "GET /images/post/1910054/H9tzdb9dUvibBnPGl25z0CDb.png HTTP/1.1" 200 1325597
/tmp/fanbox/xub/1910054_0.png
[urllib3.connectionpool][debug] https://downloads.fanbox.cc:443 "GET /images/post/1910054/7wbWoWIg41Vgba7SQZxoJrXp.png HTTP/1.1" 200 1312868
/tmp/fanbox/xub/1910054_1.png
[urllib3.connectionpool][debug] https://downloads.fanbox.cc:443 "GET /images/post/1910054/5lpx9SqlnPzbwHrfkHgjnD0D.png HTTP/1.1" 200 1899017
/tmp/fanbox/xub/1910054_2.png

gallery-dl first downloads the cover image, followed by html embeds, images, files, and external embeds.

Maybe that order is wrong for certain posts? An option to customize the order should be easy enough to implement.

mikf avatar Jul 01 '22 10:07 mikf

Well, I lost my log from the dozens of galleries I downloaded and I'm not subbed to anything this month. Can't exactly verify anything.

Is this a change that was made in the last few months and just nobody has called attention to it? Or is there some kind of command line way to download in a proper sequential order?

b51de avatar Jul 05 '22 21:07 b51de

The file order was like this since fanbox support got added in #1459.

There have only been 2 real commits to fanbox.py which changed anything of importance since then, and they didn't touch anything related to file order:

* f31ab0d2 [fanbox] fetch data for each individual post (fixes #2388)
* 22b04339 [fanbox] support pixiv redirects (closes #2122)

to download in a proper sequential order?

What is the proper order? From all the examples that I have seen, the current order is fine as is, and I do not know what to change unless you (or someone) gives a proper example.

mikf avatar Jul 08 '22 12:07 mikf

Recently also ran into this issue. I'm not really sure how to diagnose it, but I found a post with replicable incorrect order. https://mochirong.fanbox.cc/posts/3746116 (NSFW) will download in alphabetical hash order instead of the order it appears in the article. I noticed while printing out the content_body in the fanbox extractor that imageMap seems to be ordered alphabetically vs how the image blocks are ordered (correctly). Hope this helps.

shrublet avatar Jan 01 '23 08:01 shrublet

Ok I made a really shitty workaround that I'm sure somebody can retrofit or implement in a smarter way. In short, I created a new post entry called "order" that's a list based on the blocks found in content_body that contain imageId. I then use this list of image ids to reorder the imageMap. I don't think this solution is great, but I'll include relevant snippets just in case.

Changed these lines to this -

if content_body:
    if "html" in content_body:
        post["html"] = content_body["html"]
    if post["type"] == "article":
        post["articleBody"] = content_body.copy()
    if "blocks" in content_body:
        content = []
        order = []
        append = content.append
        for block in content_body["blocks"]:
            if "text" in block:
                append(block["text"])
            if "links" in block:
                for link in block["links"]:
                    append(link["url"])
            if "imageId" in block:
                order.append(block["imageId"])
        post["content"] = "\n".join(content)
        post["order"] = order

Added this snippet above this line -

if "imageMap" in content_body:
    reordered_imageMap = {k: content_body["imageMap"][k] for k in post.get("order")}
    content_body["imageMap"] = reordered_imageMap

shrublet avatar Jan 01 '23 09:01 shrublet

Thanks for the example post and code. This should be fixed in https://github.com/mikf/gallery-dl/commit/7d6c8461763b4c7e0a93759e342962cd861e992b for images, but I think this might also be an issue with fileMap files. They are most likely handled in the same way as imageMap ones, but I'm not entirely sure.

mikf avatar Jan 05 '23 12:01 mikf