guessit icon indicating copy to clipboard operation
guessit copied to clipboard

Release group detection fails for [SSA] anime releases

Open reconman opened this issue 4 years ago • 8 comments

[SSA] releases are detected as both mkv and ssa file, even though the release group uses the normal anime scheme.

Here's the console output:

For: [SSA] Uma Musume - Pretty Derby S2 - 05 [1080p].mkv
GuessIt found: {
    "container": [
        "ssa",
        "mkv"
    ],
    "alternative_title": "Pretty Derby",
    "season": 2,
    "episode_title": "05",
    "mimetype": "video/x-matroska",
    "type": "episode"
}

[SSA] Uma Musume - Pretty Derby S2 - 05 [1080p].mkv is the original file name and no release group can be detected.

reconman avatar Feb 01 '21 22:02 reconman

Seems like these 4 lines are responsible for it: https://github.com/guessit-io/guessit/blob/develop/guessit/rules/properties/container.py#L56L59

After I removed them, the release group was detected fine.

reconman avatar Feb 01 '21 23:02 reconman

I'm also having trouble with Release Group detection:

Actual vs expected behaviour

[EMBER].Dr..Stone.S02E03.[1080p].[HEVC.WEBRip].(Dr..Stone:.Stone.Wars) has its group detected as Dr. Stone: Stone Wars , expected result is EMBER.

[Judas].Dr.Stone.-.S02E03.[1080p][HEVC.x265.10bit][Multi-Subs].(Weekly) has its group detected as Weekly, expected result is Judas.

[SSA].Dr..Stone.S2.-.03.[720p].mkv and [Naruto-Kun.Hu].Dr..Stone.S2.-.03.[1080p].mp4 both have no group detected. Expected results are SSA and Naruto-Kun.Hu respectively.

Tatoeba.Last.Dungeon.Mae.no.Mura.no.Shounen.ga.Joban.no.Machi.de.Kurasu.Youna.Monogatari..04.(1080p).[AF1FB9C1] - Currently no release group is detected. I don't think there's a reasonable way of handling this.

DanMachi.S3.-.12.(AbemaTV.1080p).mkv - Currently no release group is detected. I don't see any other RGs using this convention, so I think we can ignore it.

[豌豆字幕组&LoliHouse].进击的巨人./.Shingeki.no.Kyojin.-.67.[WebRip.1080p.HEVC-10bit.AAC][简繁内封字幕] - Currently incorrectly detected as 豌豆字幕组, should be 豌豆字幕组&LoliHouse. I think we should try to include everything until we hit a matching closing bracket.

[Sick-Fansubs].Shingeki.no.Kyojin.67.[720p][B1C62720].mp4 - Currently incorrectly detected as Sick, should be Sick-Fansubs. Probably the same issue as previous.

[Sick.Fansubs].Shingeki.no.Kyojin.67.[720p][B1C62720] - Currently detected as Sick, should be Sick Fansubs. Not sure why this doesn't work, as [Erai.raws] does get correctly detected as Erai raws.

Suggestion The majority of release groups at the moment put the group name in square brackets at the beginning of the release name. I believe that if something is within square brackets at the start of the release name, this should be used as the group name.

Also, linking related discussions to help people find this one:

  • pymedusa/Medusa#9151
  • pymedusa/Medusa#9155

Misofist avatar Feb 02 '21 10:02 Misofist

@lcdt22890158 I think the . characters in the file name are breaking the parser in many of your examples.

reconman avatar Feb 02 '21 10:02 reconman

I believe Medusa puts dot characters in all file names. I think it also turns them into spaces before it sends them to guessit.

Misofist avatar Feb 02 '21 10:02 Misofist

For the SSA guessed as the container, the problem is that ssa is effectively a file format for subtitles : https://en.wikipedia.org/wiki/SubStation_Alpha

Si you can it from options.json list if you really want to get this release group working. (See https://doc.guessit.io/configuration/ to customize this configuration in your application).

There's still an issue with episode guessed as episode_title though, that occurs when replacing SSA with abc.

For: [abc] Uma Musume - Pretty Derby S2 - 05 [1080p].mkv
GuessIt found: {
    "release_group": "abc",
    "title": "Uma Musume",
    "alternative_title": "Pretty Derby",
    "season": 2,
    "episode_title": "05",
    "screen_size": "1080p",
    "container": "mkv",
    "mimetype": "video/x-matroska",
    "type": "episode"
}

Toilal avatar Feb 03 '21 22:02 Toilal

Maybe we should not guess SSA as a container when there's already another container found (mkv here), especially if it's provided by file extension.

Toilal avatar Feb 03 '21 22:02 Toilal

Like I said, I'm wondering why the code first checks if the file ends with a known extension and then additionally checks if the file name contains one of the known container names.

So a file named mkv mp4 ass ssa 01.avi is detected as

GuessIt found: {
    "container": [
        "mkv",
        "mp4",
        "ssa",
        "avi"
    ],
    "title": "ass",
    "episode": 1,
    "mimetype": "video/avi",
    "type": "episode"
}

Without those string matches, it would just detect the avi part as container.

reconman avatar Feb 03 '21 22:02 reconman

So, can the additional checks be thrown out?

reconman avatar Apr 27 '21 15:04 reconman