e-hentai-db icon indicating copy to clipboard operation
e-hentai-db copied to clipboard

About eh torrent pages limit.

Open northtowertop opened this issue 2 years ago • 5 comments

Hi ccloli, It seems that now ehentai only allow 100 pages view for torrent list. And since the latest sql file you uploaded is at early 2022, there is no way I can re-synchronize those early torrent. So is it convenient for you to upload a newer version of the sql file? Or maybe just the torrent chart. Many thanks.

northtowertop avatar Jun 10 '22 05:06 northtowertop

I updated the database dump of today, you may need to replace the INSERT query into REPLACE, INSERT IGNORE or ON DUPLICATE KEY UPDATE depends on your situation.

BTW to grab old torrents, you may need use some tricky way, like using search keyword to get matched galleries. I do use this way to import some galleries not in the original json, like searching C99 C98 C97 C89. If it's still too large, try combining keywords, like searching touhou gives you 6315 torrents which reaches the maximum limit of 100 pages (5000 record), but if you try touhou english then got 3626, try touhou -english then got the rest 2689.

ccloli avatar Jun 10 '22 14:06 ccloli

Ok, got it. Really appreciate for the reply and the dump file. And just one more thing. I noticed that the original meta data from eh contains the torrent list of each gallery, but when the script do the sync work, the torrent information won't be imported into the torrent chart together. Is this what you intended to do so or it is because my wrong settings?

northtowertop avatar Jun 10 '22 15:06 northtowertop

Is this what you intended to do so or it is because my wrong settings?

Neither, just because I'm lazy.

Though to be clear, on E-Hentai, all the torrents are related to a root gallery, like if you created gallery A, then you updated it and got a gallery B, when you upload a torrent on gallery B, it still belongs to gallery A. The root gallery isn't provided in original metadata, but can be found on the torrent's download page. So the sync script will fetch each gallery's torrent page, to get its uploaded torrents and alias the current gallery to its root gallery.

ccloli avatar Jun 10 '22 17:06 ccloli

Thanks for the reply. And for torrent sync, I just noticed that it will also fetch the relevant galleries as well. But the case is that not all galleries have torrent data, and because the gallery sync use the latest post time as the stop flag, will it cause gallery missing problem? For example, I have A(newest) B C three new galleries which is not in the database. A has torrent while B and C don't have torrent. Then I run torrent sync, the system will add gallary A to the database because it has torrent but not in gallery chart. B and C will be skipped. And the newest timestamp in gallary chart will be updated to A's post time. Then next time I run gallery sync without a certain timestamp, the galleries B and C will not be included forever. There might be some misunderstandings of the code and functions because I'm a java noob xD. Sorry about that in advance.

northtowertop avatar Jun 10 '22 20:06 northtowertop

Well ,the sync script is messy, and I can't remember what I've done, since it's a bit long ago.

I believe when the script starts syncing gallery, the script will import the galleries first, but the root_gid field is empty. Then there is another script, that will find all the galleries don't have a root_gid, then grabbing their torrent download page to determine the root gallery's id, and import torrents if it exists.

The torrent import script is mostly for grabbing the most latest galleris (E-Hentai has a delay that the gallery page may have cache, and the latest gallery may around 30 minutes ago, but some gallery don't show in the list has torrents), though it will make the gallery import script cannot rely on the last gallery. To fixed that, the script has extra parameters that can force sync like 24 hours' galleries to come over the gap. If you don't care about the database is not really up-to-date when syncing, then there's no need to run torrent-sync script, or it maybe worse.

BTW this is my crontab if you need it.

0 */1 * * * cd /var/www/e-hentai-db/ && npm run sync exhentai.org && npm run torrent-import exhentai.org && npm run torrent-sync exhentai.org 1
30 */6 * * * cd /var/www/e-hentai-db/ && npm run resync 48
15 */12 * * * cd /var/www/e-hentai-db/ && npm run sync exhentai.org 24

And may I ask for your server's local time and how you set the timestampOffset?

I'm not sure about the timezone offset. I checked it just now, seems it's BST?

root@ccloli:~# date
Fri Jun 10 22:12:58 BST 2022
root@ccloli:~# node
> Date()
'Fri Jun 10 2022 22:14:15 GMT+0100 (British Summer Time)'

ccloli avatar Jun 10 '22 21:06 ccloli