ehentai-archive-info icon indicating copy to clipboard operation
ehentai-archive-info copied to clipboard

(Suggestion) Updating the metadata file

Open ghost opened this issue 5 years ago • 9 comments

A gallery's metadata is constantly changing (new titles, new/deleted tags, etc) and it would be nice if we could use the gallery's link inside the metadata file to update the file with new metadata

If the gallery's link inside the metadata doesn't exist anymore, or if it has been expunged, or if it's been turned into a child/parents, it should dump the name of those galleries in a separate text folder too.

ghost avatar Mar 05 '20 22:03 ghost

See #2, this may be the same request as that one.

dnsev-h avatar Mar 07 '20 05:03 dnsev-h

Oh yeah it is. I'm just dumb and didn't notice the update.

Why does it say it's not efficient with updateMetadataIfExists=enabled? Is it doing something else with API calls besides fetching the gallery link and updating the metadata file? If the gallery has been removed and no longer exists, will it resume reverse lookups? And if the gallery has been updated, will it still update the metadata file, even if the local archive is still outdated?

ghost avatar Mar 09 '20 04:03 ghost

Why does it say it's not efficient with updateMetadataIfExists=enabled?

The use of e-hentai's API is not as efficient as it could be. The API supports requesting gallery information in bulk, about 20 at a time. This script will only request request information about one gallery per request since it was easiest to update the code to do that.

And while it isn't as efficient as theoretically possible, it is not less efficient than the process when reverse lookup is used.

Is it doing something else with API calls besides fetching the gallery link and updating the metadata file?

Nope, reason is just what was stated above.

If the gallery has been removed and no longer exists, will it resume reverse lookups?

API data is not deleted when a gallery is removed, so reverse lookup isn't necessary.

And if the gallery has been updated, will it still update the metadata file, even if the local archive is still outdated?

The metadata file will be overwritten with the newly fetched information.

dnsev-h avatar Mar 11 '20 03:03 dnsev-h

If the URL doesn't match the gid and token in the metadata file, would it be possible to not make it overwrite the metadata file with a "skip" message in the log file?

ghost avatar Mar 11 '20 04:03 ghost

Not sure what you mean by that. The URL is generated using the gid and token; if this information is not present, updating will not work.

https://github.com/dnsev-h/ehentai-archive-info/blob/b38bb0bdc41212b06cc92df096e76e8ab213375e/src/runner.js#L155-L161

dnsev-h avatar Mar 12 '20 03:03 dnsev-h

https://e-hentai.org/g/1388797/a713b0e340/

Once a gallery has been updated, it receives a new gid and token and the older gallery becomes a "parent". I currently have an archive with these numbers "gid": 1388797, "token": "a713b0e340". That gallery has been updated and the current gid and token is "gid": 1390372, "token": "8c729b4ee1". Because the new gallery's gid and token doesn't match with the numbers in my metadata file, I'd prefer to skip the archive and not overwrite or add anything in it, and return a warning in the log file. Usually people will update their galleries when they want to add a page, or change something on a page, or when they want to remove duplicated pages, and it'd be weird to have the latest metadata for an archive with outdated pages.

ghost avatar Mar 12 '20 04:03 ghost

The script shouldn't download information for "updated" galleries, since they are technically a new gallery. At most, the metadata may contain a reference to the new gallery's id/token, but it won't pull the data from it.

dnsev-h avatar Mar 13 '20 03:03 dnsev-h

I'm not seeing any reference to a newer gallery but this script will update the tags and it'll replace numbers with "NULL" or remove the numbers, here are some examples: thumbnail_size": "", thumbnail_rows": null, count": null Before updating the metadata, there were numbers instead of NULL or "".

The total_file_size_approx, upload_date and date_uploaded is also different, this script doesn't match with the data from your metadata userscript or your fork of EHDL with inbuilt metadata. I downloaded info.json with your metadata userscript on the same gid/token as the metadata file in my archive (that archive was also downloaded with your fork of EHDL) and it's identical (apart from the updated tags), but the updated metadata file with this userscript is completely different with removed or edited numbers.

Updating tags is completely fine, older/parent galleries will have the exact same tags as the visible/most updated gallery. But I'm not sure why the upload date, file size and other numbers are different or missing.

ghost avatar Mar 13 '20 04:03 ghost

This is the older metadata file from my archive. I used your fork of EHDL to download the archive and metadata. I downloaded that archive with the metadata on 10/08/2019. https://pastebin.com/fzpeAnnV

This is the metadata file from your metadata userscript. Everything seems to be ok. The numbers on it are 100% identical to my older metadata file except for the newer gids and tokens since that gallery has been updated a few times. I downloaded that metadata file today. https://pastebin.com/dtwcLcQq

This is the metadata file from this script. I updated the older metadata file with this script a few minutes ago. Most of the numbers have gone NULL or been changed for no reason. It also ignored adding some of the newer tags (see male: muscle). https://pastebin.com/0rW8jpP5

ghost avatar Mar 13 '20 04:03 ghost