Image metadata doesn't match up with saved filenames
I am trying to build a tool to view a saved blog (such as in the event it is no longer available).
However the meta data in images.txt and texts.txt seems to be insufficient / doesn't match up with the files that TumblThree actually downloads.
The first problem is that the array seems to be empty:
"photos": [
],
So I instead tried to rewrite the tumblr image URLs that are in the post_html field, however this unfortunately doesn't reliably work since the links don't always match the filesize suffix that was downloaded.
Futhermore the problem is repeated for texts.txt - I am not sure why but some image posts are being classed as text posts, where the body contains various images on Tumblr, which whilst also automatically downloaded, there is no way to match them up.
In short I was expecting the photos array to be populated with the name of the file as saved by TumblThree, otherwise it is very difficult to actually make use of the archive!
Hello, your goal sounds interesting.
Currently we don't additionally save the "original" links as found in the posts, only the eventually converted ones used for downloading. Maybe that should be changed in a future version.
We know about the fact that some posts are classified as text posts. It's not an error in the app. They are delivered like that and can show html content. So some tools create for any reasons text/html posts containing images. Although we cannot get everything right with such posts, at least maybe the found image links could be handled differently.
We'll have a look at it.
A short remark on the photos array and how we get back the data. For the API crawler this list is only filled if the post type is photo and the post contains a photo set (multiple images posted at once). For single-photo posts it's always empty and only the photo-url-x entries are filled. For the SVC crawler the photos array is always filled.
Anyway, the downloaded files should be listed in the metadata now.