PlexAniSync icon indicating copy to clipboard operation
PlexAniSync copied to clipboard

AniDB vs Anilist - add support for Movies and wo/o naming differences

Open karpik123 opened this issue 2 years ago • 1 comments

I went through my library and synced everything. I use x-jat names from AniDB and I noticed two naming patterns that should be straightforward to cover, saving a lot of work on custom mappings.

First Pattern - 'Movie'

AniDB Name Anilist Name
Gekijouban Blood-C: The Last Dark BLOOD-C: The Last Dark
Gekijouban Mahouka Koukou no Rettousei: Hoshi o Yobu Shoujo Mahouka Koukou no Rettousei: Hoshi wo Yobu Shoujo
Gekijouban xxxHOLiC: Manatsu no Yo no Yume xxxHOLiC: Manatsu no Yoru no Yume
Gekijouban Dungeon ni Deai o Motomeru no wa Machigatte Iru Darouka: Orion no Ya Dungeon ni Deai o Motomeru no wa Machigatte Iru Darouka: Orion no Ya

PlexAniSync can recognise this word and attempt to do an extra attempt to match title after removing Gekijouban<space> from the string.

Another similar example is 'Eiga':

AniDB Name Anilist Name
Eiga Crayon Shin-chan: Mononoke Ninja Chinfuuden Crayon Shin-chan: Mononoke Ninja Chinfuuden
Eiga Doraemon: Nobita no Little Star Wars 2021 Doraemon: Nobita no Little Star Wars 2021

Second Pattern - wo vs o

AniDB Name Anilist Name
Hige o Soru. Soshite Joshikousei o Hirou. Hige wo Soru. Soshite Joshikousei wo Hirou.
Sono Bisque Doll wa Koi o Suru Sono Bisque Doll wa Koi wo Suru
Seishun Buta Yarou wa Yumemiru Shoujo no Yume o Minai Seishun Buta Yarou wa Yumemiru Shoujo no Yume wo Minai
Nakitai Watashi wa Neko o Kaburu Nakitai Watashi wa Neko wo Kaburu
Fune o Amu Fune wo Amu

AniDB is almost universally done as o, while Anilist uses wo in titles. I don't know Japanese well enough to understand why... PlexAniSync can catch <space>o<space> in the string and do an extra attempt to match title after convering o into wo. Note top example from the table even has double o. While some titles might genuinely use o in the title, I don't expect them to be a match to a completely different title even if PlexAniSync converts innocent o into wo.

karpik123 avatar Mar 30 '22 22:03 karpik123

I got my hands on AniDB title .xml.gz file and did some top level counting. I discarded all lines from xml except lang="x-jat" and type="main".

I was left with 593 titles:

  • 211 titles with 'Gekijouban '
  • 261 titles with ' o ' (266 instances, so a few had multiple o o)
  • 142 titles with 'Eiga '

Numbers don't add up as Eiga + o or Gekijouban + o happen sometimes.

I did this to do more data checks and to confirm the logic won't be harmful. I spotted some odd cases, please read on.

The wo->o rule

The overwhelming number of examples would be perfect if o became wo.

Some oddities:

  • Hit o Nerae!, anidb: 1532 is https://anilist.co/anime/964/Smash-Hit/, has 'Hit o Nerae' as synonym on anilist (not sure if PlexAniSync checks synonyms?), so this is a case where woing won't help to solve name problem
  • The Big O (2003), anidb: 8941 is not, in fact, 'The Big Wo! (2003)', https://anilist.co/anime/567/The-Big-O/ but nothing bad will happen.
  • (NSFW warning) Ore ga Kanojo o *su Wake, anidb: 13763 is https://anilist.co/anime/101015/Ore-ga-Kanojo-wo-Okasu-Wake/, while rule is valid, it is 'wo' on anilist, it won't help because AniDB censored word okasu. But it's not a problem with the logic at this point...

Gekijouban rule

Some medium disappointment here, I have to go back on my initial assumption.

Here are examples where gekijouban-less title will match to tv show of the same name:

  • Gekijouban Violet Evergarden, anidb: 14013 is https://anilist.co/anime/103047/Violet-Evergarden-Movie/, since they weren't too inventive with the name, title will still not match.
  • Gekijouban Wakaokami wa Shougakusei!, anidb: 14011 is https://anilist.co/anime/101478/Wakaokami-wa-Shougakusei-Movie/
  • Gekijouban Shirobako, anidb: 14061 is https://anilist.co/anime/101574/SHIROBAKO-Movie/
  • Gekijouban Argonavis from Bang Dream!, anidb: 15977 is https://anilist.co/anime/128344/ARGONAVIS-from-BanG-Dream-Movie/

Funny outlier: Gekijouban Idol Bu Show, anidb: 17230 is https://anilist.co/anime/145916/IDOL-bu-SHOW-Movie/ but there's no tv show covering the name.

Eiga rule

Not as much as Gekijouban case, but I can find similar issues.

Here are examples where eiga-less title will match to tv show of the same name:

  • Eiga Zannen na Ikimono Jiten, anidb:16252 is https://anilist.co/anime/132804/Zannen-na-Ikimono-Jiten-The-Movie/
  • Eiga Delicious Party Precure, anidb: 17176 is https://anilist.co/anime/144687/Delicious-PartyPrecure-Movie/
  • Eiga no Osomatsu-san, anidb 14293 is https://anilist.co/anime/104213/Osomatsusan-The-Movie/

Other oddities: Komadori Eiga Komaneko, anidb 7306 proves that Eiga needs to be matched from the beginning of the string.


Summary

Wo-ing the titles seems safe and desired.

While all previous examples from my own library would match correct anilist title (after de-gekijoubaning or de-eigaing), there seem to be too many cases where it will cause problems.

Instead, I think it's safer to attempt to do following treatment:

  • de-gekijouban or de-eiga the title
  • add Movie and (Movie)
  • try to match
  • give up if nothing found

I attach file with cleaned titles I used for above research: https://gist.github.com/karpik123/760774de1a0a90156567d794a704e71a

karpik123 avatar Mar 31 '22 18:03 karpik123