ehentai-archive-info
ehentai-archive-info copied to clipboard
(Suggestion) Updating the metadata file
A gallery's metadata is constantly changing (new titles, new/deleted tags, etc) and it would be nice if we could use the gallery's link inside the metadata file to update the file with new metadata
If the gallery's link inside the metadata doesn't exist anymore, or if it has been expunged, or if it's been turned into a child/parents, it should dump the name of those galleries in a separate text folder too.
See #2, this may be the same request as that one.
Oh yeah it is. I'm just dumb and didn't notice the update.
Why does it say it's not efficient with updateMetadataIfExists=enabled
? Is it doing something else with API calls besides fetching the gallery link and updating the metadata file? If the gallery has been removed and no longer exists, will it resume reverse lookups? And if the gallery has been updated, will it still update the metadata file, even if the local archive is still outdated?
Why does it say it's not efficient with
updateMetadataIfExists=enabled
?
The use of e-hentai's API is not as efficient as it could be. The API supports requesting gallery information in bulk, about 20 at a time. This script will only request request information about one gallery per request since it was easiest to update the code to do that.
And while it isn't as efficient as theoretically possible, it is not less efficient than the process when reverse lookup is used.
Is it doing something else with API calls besides fetching the gallery link and updating the metadata file?
Nope, reason is just what was stated above.
If the gallery has been removed and no longer exists, will it resume reverse lookups?
API data is not deleted when a gallery is removed, so reverse lookup isn't necessary.
And if the gallery has been updated, will it still update the metadata file, even if the local archive is still outdated?
The metadata file will be overwritten with the newly fetched information.
If the URL doesn't match the gid and token in the metadata file, would it be possible to not make it overwrite the metadata file with a "skip" message in the log file?
Not sure what you mean by that. The URL is generated using the gid and token; if this information is not present, updating will not work.
https://github.com/dnsev-h/ehentai-archive-info/blob/b38bb0bdc41212b06cc92df096e76e8ab213375e/src/runner.js#L155-L161
https://e-hentai.org/g/1388797/a713b0e340/
Once a gallery has been updated, it receives a new gid and token and the older gallery becomes a "parent".
I currently have an archive with these numbers "gid": 1388797, "token": "a713b0e340"
. That gallery has been updated and the current gid and token is "gid": 1390372, "token": "8c729b4ee1"
.
Because the new gallery's gid and token doesn't match with the numbers in my metadata file, I'd prefer to skip the archive and not overwrite or add anything in it, and return a warning in the log file.
Usually people will update their galleries when they want to add a page, or change something on a page, or when they want to remove duplicated pages, and it'd be weird to have the latest metadata for an archive with outdated pages.
The script shouldn't download information for "updated" galleries, since they are technically a new gallery. At most, the metadata may contain a reference to the new gallery's id/token, but it won't pull the data from it.
I'm not seeing any reference to a newer gallery but this script will update the tags and it'll replace numbers with "NULL" or remove the numbers, here are some examples: thumbnail_size": "", thumbnail_rows": null, count": null Before updating the metadata, there were numbers instead of NULL or "".
The total_file_size_approx
, upload_date
and date_uploaded
is also different, this script doesn't match with the data from your metadata userscript or your fork of EHDL with inbuilt metadata. I downloaded info.json with your metadata userscript on the same gid/token as the metadata file in my archive (that archive was also downloaded with your fork of EHDL) and it's identical (apart from the updated tags), but the updated metadata file with this userscript is completely different with removed or edited numbers.
Updating tags is completely fine, older/parent galleries will have the exact same tags as the visible/most updated gallery. But I'm not sure why the upload date, file size and other numbers are different or missing.
This is the older metadata file from my archive. I used your fork of EHDL to download the archive and metadata. I downloaded that archive with the metadata on 10/08/2019. https://pastebin.com/fzpeAnnV
This is the metadata file from your metadata userscript. Everything seems to be ok. The numbers on it are 100% identical to my older metadata file except for the newer gids and tokens since that gallery has been updated a few times. I downloaded that metadata file today. https://pastebin.com/dtwcLcQq
This is the metadata file from this script. I updated the older metadata file with this script a few minutes ago. Most of the numbers have gone NULL or been changed for no reason. It also ignored adding some of the newer tags (see male: muscle). https://pastebin.com/0rW8jpP5