ShokoServer icon indicating copy to clipboard operation
ShokoServer copied to clipboard

match episodes and anime with trace.moe automatically

Open CuddleBear92 opened this issue 7 years ago • 19 comments

Doing a new Post of this cause much of the old thoughts about a shared cache isnt really needed or wanted anymore.

Thoughts are all unrecognized anime should be looked up on trace.moe and grabbing all the links tied to it on trace.moe. They use anilist.co and they also link to official sites like the official japanese link to the series and crunchyroll series. These links should be compared with the links hosted on AniDB for matching. Episode number from trace.moe should be kept prob.

Since the plans are to add anilist support more fully, using grabbing that would be wanted.

Automatically matching +85% matches could be done.


Original Post:

https://trace.moe/ can be run locally on the webcache and the user servers. Thumbnails can be taken out of 1min clips at set times in episodes and its hash values generated and sent to the webcache (no image upload to the cache) These will be generated by files already known by anidb that we know what is what. These hash values will be compared too when a user has an unknown file and it will upload its hash value and let the cache compare them for you, the cache will then give out the correct id for easy matching on the user side.

what it might need

  • take screenshots at set times of each known file.
  • upload that hash value as a matched file to the cache.
  • let the cache compare new incoming hash values from unknown files the user might have.
  • give the user server the correct and new anidb ids for easy matching.
  • have an option to automatically create the matched group as an empty group to lessen wait for metadata downloads when user is working with the files.
  • matched files should be listed and sorted easily in the unknown tab of the client and send the user to the avdump anidb page for that file.
  • if files aren't matched after a few tries it should be flagged and not be used with the cache again for either a while or forever before user interaction.

Doing it all in the webcache instead of using the trace.moe api will give us more control in general and will remove the limits it has cause of lack of support it has. It wouldn't be limited by the content, no matter how old or if its 18+ or not it wouldn't matter anymore as it all comes from Shoko users and their files. Doing it with strict rules also allows us to not use the whole file and keep hashes across the whole file like the site does. We can rather have a set time with more refined rules as it doesn't require us to match across the whole episode. This in the end removes the need for actual images as well as lessens the size of the database as a whole (their db is 30gb for 673million frames and that's without the images them self)

Links: https://trace.moe/ https://github.com/soruly/whatanime.ga api: https://soruly.github.io/whatanime.ga/ https://www.patreon.com/soruly

EDIT: this comes all from the chat that took place in the future-requests channel on the discord after i brought it up to use the api. using the webcache is a better option for us as it would not rely on anyone but the shoko community and would be more open to more content.

CuddleBear92 avatar Feb 01 '18 20:02 CuddleBear92

This will still need to be per file, as we link file to episode. Due to the lack of info on release groups, it will need to be an extension of the Unrecognized Files Utility. It can be used to help manually link and AVDump files, though.

da3dsoul avatar Feb 01 '18 21:02 da3dsoul

yeah it would need to be per file. don't see that much of an issue in the end if its made to do it over a longer amount of time slowly. the biggest issue would be to stress their servers too much and they would block us. but even at once file an hour or two will give you a series a day in the long run. well if it gets a match on the first screenshot.

it would be most useful for files you know nothing or next to nothing about. or heck lazy users that dont want to figure it out.

CuddleBear92 avatar Feb 01 '18 21:02 CuddleBear92

should it automatically add an empty series to the database when a match is made? would make sense to automatically do that too as it would cut the time to wait before its all in place when the user finally dumps and rechecks it. and if it did that then it could add an custom tag to the series or a flag that we can filter.

CuddleBear92 avatar Feb 01 '18 21:02 CuddleBear92

I'd say make it a setting

da3dsoul avatar Feb 01 '18 21:02 da3dsoul

edited the whole OP to fit the talk that was on the discord server last night. moving away from api usage and their site to doing some work on the user servers and sending it to the webcache to compare.

CuddleBear92 avatar Feb 02 '18 11:02 CuddleBear92

I also have plans to detect anime series by reading video files. However I can't decide how many thumbnails should be taken from one video for search would yield accurate results within reasonable time. Yet, recently I've updated the database system so it's much faster and less likely to overload now. (note that API limit still applies) Currently whatanime.ga API returns anilist ID and MAL ID from search results. But for AniDB ID, a mapping of AniDB <-> MAL ID is needed.

soruly avatar Apr 14 '18 10:04 soruly

I wouldn't trust the MAL id especially if we wanted to pull data as the MAL API tends to be unreliable

On Sat, Apr 14, 2018, 8:21 PM soruly [email protected] wrote:

I also have plans to detect anime series by reading video files. However I can't decide how many thumbnails should be taken from one video for search would yield accurate results within reasonable time. Yet, recently I've updated the database system so it's much faster and less likely to overload now. (note that API limit still applies) Currently whatanime.ga API returns anilist ID and MAL ID from search results. But for AniDB ID, a mapping of AniDB <-> MAL ID is needed.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ShokoAnime/ShokoServer/issues/703#issuecomment-381319249, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8rph7yMmPDPPytceCLM-uOfdMpUm0tks5toc21gaJpZM4R2Q-F .

Cazzar avatar Apr 14 '18 10:04 Cazzar

You could use AniList with MAL IDs, though.

da3dsoul avatar Apr 14 '18 10:04 da3dsoul

Hey @soruly, WhatAnime has helped me find a few series so thanks for creating it. :)

We're currently in the middle of doing some major changes to Shoko Server but I like the idea of using WhatAnime to help with our unrecognized process. Looking forward to that update to see what we can do with it. :)

ElementalCrisis avatar Apr 14 '18 17:04 ElementalCrisis

One thing I'd like is to use whatanime.ga to pull the time that a screenshot occurred. TvDB has garbage quality, but we can't just grab randomly, else we'll get trash and/or spoiler shots.

If it's possible to give the server an episode and image, and have it return a time, then we could use it to cross reference TvDB for high quality thumbnails. I say episode and image to reduce the load on the server. We could easily thrash your site with our thousands of users calling it at the same time when our TvDB updates occur.

da3dsoul avatar Apr 14 '18 17:04 da3dsoul

In the long run I'd prefer to make whatanime.ga distributed, so anyone with skills would be able to setup an image database of their own.

soruly avatar Apr 15 '18 13:04 soruly

That'd be cool, but doesn't that whole system take a lot of processing power? I'd think it would. If we cut out just the hashing and matching parts, and had a web cache with just storage, then we could put most of the strain on distributed clients with enough power. If you put together such a thing, we would gladly contribute. Some of our users can cover just about every anime ever, and our system requires a decent CPU, or hashing takes forever. I only have a basic idea of how it works, so I'm only guessing.

da3dsoul avatar Apr 15 '18 14:04 da3dsoul

With my new improvements made on searching, I think a decent quad core CPU would be able to handle any search in 5 seconds. And for hashing video, a 24-minute video takes ~30 seconds to hash on a 4GHz quad core machine).

My plan is to open and publish my hashes (maybe like this https://data.whatanime.ga/100240/ ) and users just need to download and import these pre-hashed files into their own database for local search.

soruly avatar Apr 15 '18 14:04 soruly

Good Idea! :) "Distributed computing" like SETI

misakitchi avatar Apr 16 '18 10:04 misakitchi

I've opensourced the distributed indexing system, you can take a look https://github.com/soruly/sola

soruly avatar May 18 '18 05:05 soruly

If there is a way we could port the video indexing into .net it could be something to have shoko, probably as an opt-in situation

Cazzar avatar May 20 '18 14:05 Cazzar

whatanime.ga has moved to https://trace.moe https://www.patreon.com/posts/moving-to-new-22212117

soruly avatar Oct 25 '18 03:10 soruly