TumblThree
TumblThree copied to clipboard
Problem with file names
I only now realized, after having recently downloaded over a hundred blogs, that the new Tumblr file name format often doesn't correspond to downloaded file URLs. For example, the image https://64.media.tumblr.com/f6e487c4e659909af4ffeea43a64a13a/38a1c3a32a681028-87/s640x960/11f1917c7b330c67ddeaf6ae5dcd0482e3af4b01.jpg
got saved as 62ee80a9907955e853300aba5e7f29d3e1292387.jpg
.
Additionally, some file URLs (like this one: https://64.media.tumblr.com/8d27f578686a46ff28dcaf43b88803d6/413efd20f21ddb95-39/s1280x1920/34f74bd71552c819710ab3cfe798ddea9a2607be.jpg
, which got saved as 34f74bd71552c819710ab3cfe798ddea9a2607be.jpg
which this time clearly does correspond to the URL) are not included at all in the "images.txt" file (the post URL, caption, and all the other info is there, just not the image URL).
With the old format, I used to browse the downloaded files and search their file names in the "images.txt", "texts.txt", etc. files to find the posts. With this new format, unless I rename the files to for example "%i_%f" (which includes the post ID), searching for posts using the downloaded files' names is sometimes impossible. What is the reason for that and is there a way to fix that? Additionally, is there a way to rename all files without having to redownload an entire blog? I changed the filename template and did a full rescan, but it doesn't rename the files.
The reasons is that they changed their filename pattern some years ago.
Old media urls had the same base filename part for different offered sizes:
"photo-url-100": "https:\/\/64.media.tumblr.com\/b40edf621f92287fc072a81d5578818b\/tumblr_ntv6ugOz141u98bcko1_100.jpg",
"photo-url-1280": "https:\/\/64.media.tumblr.com\/b40edf621f92287fc072a81d5578818b\/tumblr_ntv6ugOz141u98bcko1_1280.jpg",
"photo-url-250": "https:\/\/64.media.tumblr.com\/b40edf621f92287fc072a81d5578818b\/tumblr_ntv6ugOz141u98bcko1_250.jpg",
"photo-url-400": "https:\/\/64.media.tumblr.com\/b40edf621f92287fc072a81d5578818b\/tumblr_ntv6ugOz141u98bcko1_400.jpg",
"photo-url-500": "https:\/\/64.media.tumblr.com\/b40edf621f92287fc072a81d5578818b\/tumblr_ntv6ugOz141u98bcko1_500.jpg",
"photo-url-75": "https:\/\/64.media.tumblr.com\/b40edf621f92287fc072a81d5578818b\/tumblr_ntv6ugOz141u98bcko1_75sq.jpg",
New media urls don't have a common filename part:
"photo-url-100": "https:\/\/64.media.tumblr.com\/b2279307e5692b4a4a744221ddc4a0f0\/fcda0031b95223f7-da\/s100x200\/180773382e99f967398303e377f291efcc0b3aad.jpg",
"photo-url-1280": "https:\/\/64.media.tumblr.com\/b2279307e5692b4a4a744221ddc4a0f0\/fcda0031b95223f7-da\/s1280x1920\/3d1dbe251b3289b2e8fb7cf594c51158cce65c61.jpg",
"photo-url-250": "https:\/\/64.media.tumblr.com\/b2279307e5692b4a4a744221ddc4a0f0\/fcda0031b95223f7-da\/s250x400\/e226305f8a4771e89f961a6472b3f8ada5136ec6.jpg",
"photo-url-400": "https:\/\/64.media.tumblr.com\/b2279307e5692b4a4a744221ddc4a0f0\/fcda0031b95223f7-da\/s400x600\/3391e38c20b61a7793cf625e4f7a57a7df0c349c.jpg",
"photo-url-500": "https:\/\/64.media.tumblr.com\/b2279307e5692b4a4a744221ddc4a0f0\/fcda0031b95223f7-da\/s500x750\/273a022b8f00021cbf61403525ec665c0ce2c254.jpg",
"photo-url-75": "https:\/\/64.media.tumblr.com\/b2279307e5692b4a4a744221ddc4a0f0\/fcda0031b95223f7-da\/s75x75_c1\/e63a420c108f34772c2f275f50c8e51db59e902e.jpg",
Normally you should find the downloaded file's name in the meta file (e.g. images.txt). But as you already pointed out that's for some yet unknown reason not always the case. It's a little bit cumbersome to analyze and find that, but should be possible.
No, currently there's no way to rename already downloaded files. That would also be very difficult particularly if you haven't saved the crawler data before. You can change the filename template, but that only affects newly downloaded files.