trackma icon indicating copy to clipboard operation
trackma copied to clipboard

Misc recognition issues

Open FichteFoll opened this issue 5 years ago • 19 comments

[D] Engine: Adding to library: /data/Video/Anime/data_tmp_Anime/[PAS] Houseki no Kuni - 05 [WEB 720p E-AC-3] [F671AE53].mkv
[D] Engine: Redirected to Houseki no Kuni 3

[D] Engine: Adding to library: /data/Video/Anime/The Tatami Galaxy/[Opportunity] The Tatami Galaxy - NCED [BD 720p] [903A3AD6].mkv
[D] Engine: Redirected to Yojouhan Shinwa Taikei 1

[D] Engine: Adding to library: /data/Video/Anime/The Tatami Galaxy/[Opportunity] The Tatami Galaxy 10 - The 4.5-Tatami Idealogue [BD 720p] [FF757616].mkv
[D] Engine: Redirected to Yojouhan Shinwa Taikei 4

[D] Engine: Not a show, skipping: /data/Video/Anime/The Tatami Galaxy/[Opportunity] The Tatami Galaxy 11 - The End of the 4.5-Tatami Age [BD 720p] [6E03B068].mkv

These are the most obvious ones in my current library. The others stem from either the appropriate show now being on my list or with the completed status, which results in specials being recognized as episodes for the main TV show or vice versa. Things containing "NCED" or "NCOP" should probably not be recognized at all, if possible.

FichteFoll avatar May 02 '19 10:05 FichteFoll

Enabled a whole scan now (with completed and dropped), but got mixed results. GuP OVAs are all recognized as the single-ep Anzio OVA instead of the "Girls und Panzer Specials" (as they are called on anilist, duh). And the file that actually contains the Anzio OVA is then recognized as the movie.

[D] Engine: Adding to library: /data/Video/Anime/[-__-'] Girls und Panzer [BD 1080p]/[-__-'] Girls und Panzer OVA 6 [BD 1080p FLAC] [B13C83A0].mkv
[D] Engine: Redirected to Girls und Panzer: Kore ga Hontou no Anzio-sen Desu! 6

[D] Engine: Adding to library: /data/Video/Anime/[-__-'] Girls und Panzer [BD 1080p]/[-__-'] Girls und Panzer OVA Anzio-sen [BD 1080p FLAC] [231FDA45].mkv
[D] Engine: Redirected to Girls und Panzer der Film 1

[D] Engine: Adding to library: /data/Video/Anime/[-__-'] Girls und Panzer [BD 1080p]/Extras/[-__-'] Girls und Panzer NCED 04 [BD 1080p FLAC] [88F40C98].mkv
[D] Engine: Redirected to Girls und Panzer 4

FichteFoll avatar May 02 '19 12:05 FichteFoll

More

[D] Engine: Not a show, skipping: /data/Video/Anime/[Underwater-FFF] Saki Zenkoku-hen - The Nationals [BD][1080p-FLAC]/[Underwater-FFF] Saki Zenkoku-hen - The Nationals - 01 [BD][1080p-FLAC][81722FD7].mkv

[D] Engine: Not a show, skipping: /data/Video/Anime/The Tatami Galaxy/[Opportunity] The Tatami Galaxy 11 - The End of the 4.5-Tatami Age [BD 720p] [6E03B068].mkv

[D] Engine: Adding to library: /data/Video/Anime/[-__-'] Girls und Panzer [BD 1080p]/Extras/[-__-'] Girls und Panzer NCED OVA Anzio-sen [BD 1080p FLAC] [ED2078EC].mkv
[D] Engine: Redirected to Girls und Panzer der Film 1

[D] Engine: Adding to library: /data/Video/Anime/[-__-'] Girls und Panzer [BD 1080p]/Extras/[-__-'] Girls und Panzer NCOP [BD 1080p FLAC] [04ED15D5].mkv
[D] Engine: Redirected to Girls und Panzer: Kore ga Hontou no Anzio-sen Desu! 1

Just to clarify, I made sure that I have the proper shows in my list when doing the scan (and they were include since I did a whole scan).

FichteFoll avatar May 10 '19 15:05 FichteFoll

I wrote a little script that takes an Excel file full of episode filenames, and basically runs the _processFilename function from AnimeInfoExtractor.py, and writes the values of each class property out to an Excel file after every function-call, essentially letting you see how the filename is being processed (It creates a new sheet for each filename).

~~I'm not 100% on it working exactly as it is supposed to, as a lot of properties seem to be immediately populated on object init (I have no clue why), but it should give some insight.~~ So, I was just being an idiot and missing the obvious - the AnimeInfoExtractor class actually runs the _processFilename function as part of its init, so my script was inadvertently running it twice, and only recording the results from the second time; now that I've commented it out of the init, everything works as expected.

To run it, you just need to change the paths in the script to wherever you put the Excel files - it also needs the openpyxl package (to work with Excel), so you might need to PIP it if you don't have it already.

You can add/remove whatever file names you want to test in files.xlsx - it is pre-populated with some of the episodes you mentioned in this thread, some that others had mentioned in other threads on recognition, and a couple of ones from me (I use AniDB naming, and no series with a year in its name gets correctly IDed). The output.xlsx file already has the data from the episodes listed in the files.xlsx file - namely:

  • [HorribleSubs] Nobunaga-sensei no Osanazuma - 04 [720p].mkv
  • [PAS] Houseki no Kuni - 05 [WEB 720p E-AC-3] [F671AE53].mkv
  • [Opportunity] The Tatami Galaxy - NCED [BD 720p] [903A3AD6].mkv
  • [Opportunity] The Tatami Galaxy 10 - The 4.5-Tatami Idealogue [BD 720p] [FF757616].mkv
  • [Opportunity] The Tatami Galaxy 11 - The End of the 4.5-Tatami Age [BD 720p] [6E03B068].mkv
  • [VCB-Studio+Commie] Sword Art Online II [01].mkv
  • Chio-chan no Tsuugakuro - 04 [HorribleSubs] [www, 720p, AAC] [5D4D1205].mkv
  • Kill.la.Kill.S01E01.1080p-Hi10p.BluRay.FLAC2.0.x264-CTR.[98AA9B1C].mkv
  • Fairy Tail (2018) - 03 [HorribleSubs] [www, 720p, AAC] [9A8FC164].mkv
  • Bungou Stray Dogs (2019) - 06 [HorribleSubs] [www, 720p, AAC] [496D45BB].mkv
  • One Punch Man (2019) - 07 [HorribleSubs] [www, 720p, AAC] [6D4B8217].mkv

Feel free to modify/edit/rewrite the script however you please - this was the first python file I've actually completely written, as I usually just troubleshoot or edit, so there is likely lots of room for improvement. Hopefully it will be end up being helpful in some way.

TrackMa Testing.zip

purposelycryptic avatar May 29 '19 02:05 purposelycryptic

So, at least for the Tatami Galaxy filenames, the recognition issue seems to be that it takes the first number after a '-' to be the episode number; if there is no '-', it will take one (not sure exactly on the criteria) of the other numbers not part of the CRC (if present), and if no numbers are present, default to 1?

Something like that, at least.

So, putting a '-' between the series name and episode number as a general pattern should likely take care of that issue, if you usually include the episode title. There is no filtering for OP/ED/NCOP/NCED/SP/O or any other common epnum prefixes/alternates for non-basic episodes at all, atm. If anyone is interested in adding recognition for those, HAMA has some pretty incredible RegEx filtering a part of its identification system.

As for my issue, anything in brackets gets auto-cut from the series name for recognition purposes, so any series with a year at the end as a differentiator from its prequel series (i.e., One Punch Man (2019), Bungou Stray Dogs (2019), Fairy Tail (2018), etc) simply get recognized as the prequel, or sometimes the reverse for reasons I can't yet explain. The only time the info in brackets is used for recognition seems to be when, without that info, the series name ends up a complete blank.

purposelycryptic avatar May 29 '19 02:05 purposelycryptic

OK, I think I've solved your core issues, with the exception of filtering the NCEDs/NCOPs/OVAs/etc - AnimeInfoExtractor.py isn't really set up to not identify a file, even when it finds no episode number, it just automatically defaults to '1'.

One very simple solution would be to simply to run a RegEx search for those strings, and, if it returns positive, set the name and filename attributes of the object to '' or some other string which would result in a negative ID for the file. That would be trivial to implement, at least.

For now, the full output from the testing files is here in an Excel Sheet, and the final object attributes from AnimeInfoExtractor.py for your files using the updated versions follow:

Saki:

Variable Value
self.originalFilename [Underwater-FFF] Saki Zenkoku-hen - The Nationals - 01 [BD][1080p-FLAC][81722FD7].mkv
self.resolution 1080p
self.hash 81722FD7
self.subberTag Underwater-FFF
self.videoType  
self.audioType FLAC
self.releaseSource BD
self.extension mkv
self.episodeStart 1
self.episodeEnd  
self.volumeStart  
self.volumeEnd  
self.version 1
self.name Saki Zenkoku-hen - The Nationals
self.pv -1

Houseki no Kuni:

Variable Value
self.originalFilename [PAS] Houseki no Kuni - 05 [WEB 720p E-AC-3] [F671AE53].mkv
self.resolution 720p
self.hash F671AE53
self.subberTag PAS
self.videoType  
self.audioType  
self.releaseSource  
self.extension mkv
self.episodeStart 5
self.episodeEnd  
self.volumeStart  
self.volumeEnd  
self.version 1
self.name Houseki no Kuni
self.pv -1

The Tatami Galaxy:

Variable Value
self.originalFilename [Opportunity] The Tatami Galaxy 10 - The 4.5-Tatami Idealogue [BD 720p] [FF757616].mkv
self.resolution 720p
self.hash FF757616
self.subberTag Opportunity
self.videoType  
self.audioType  
self.releaseSource BD
self.extension mkv
self.episodeStart 10
self.episodeEnd  
self.volumeStart  
self.volumeEnd  
self.version 1
self.name The Tatami Galaxy
self.pv -1
Variable Value
self.originalFilename [Opportunity] The Tatami Galaxy 11 - The End of the 4.5-Tatami Age [BD 720p] [6E03B068].mkv
self.resolution 720p
self.hash 6E03B068
self.subberTag Opportunity
self.videoType  
self.audioType  
self.releaseSource BD
self.extension mkv
self.episodeStart 11
self.episodeEnd  
self.volumeStart  
self.volumeEnd  
self.version 1
self.name The Tatami Galaxy
self.pv -1

If all looks good, I'll open a pull request.

purposelycryptic avatar Jun 01 '19 23:06 purposelycryptic

@FichteFoll As far as I can tell, I have managed to clear up the identification issues that originated in AnimeInfoExtractor.py - if you would be willing to try it out and offer third-party insight as to whether everything works as it should with your library as well, you can grab the file from my branch; since it's completely modular, you just need to swap the file with your current version, and can swap it back at any time without issue if you notice any problem.

I also added my testing script to the repo, which lets you see exactly what information Trackma is getting from the AnimeInfoExtractor, along with usage instructions, in case you need to troubleshoot anything.

As I've yet to get any response from you in regard to this issue thus far, if I don't hear from you, I'll just assume it is no longer of concern for you, although having someone else test it would be very helpful in discovering any possible issues that I may have missed due to variations in file naming patterns I did not consider.

purposelycryptic avatar Jun 24 '19 01:06 purposelycryptic

I just want to let you know that I have seen your reply earlier but haven't found the time to look into and/or test your changes locally.

Other than that, I have to confess that I have, indeed, not suffered from this issue whatsoever in the past month, but that is likely because of a different issue in my workflow.

FichteFoll avatar Jun 24 '19 01:06 FichteFoll

No problem - I am still quite ill, so my ability to work on anything is fairly sporadic because of it anyway; In part I just wanted to make sure there was still another person at the other end of this issue (although having someone else test my changes would definitely be helpful). In the end, I started on this issue because it was giving me problems too, as well as several other users.

I just updated my testing/troubleshooting script for the AnimeInfoExtractor (mostly QoL changes), but I feel like I may have hit a wall, in that, when testing files using the modified AnimeInfoExtractor, the results it generated correctly identify the series and episode number, but the same files still end up misidentified by Trackma (For example, the extractor identifies episodes of Fruits Basket (2019) correctly, but Trackma still identifies them as episodes of the original Fruits Basket.

I feel like I'm missing some big part of the puzzle (although I really hope not), and will likely have to dig deep into the rest of the Trackma codebase to figure out what, but in my current condition, that could take quite a while.

purposelycryptic avatar Jun 27 '19 20:06 purposelycryptic

Just a wild guess, but trackma only matches against show names on your list, so you might want to check whether that works. Furthermore, the examples from this thread are perfect for building a test suite, so if that doesn't exist already, it would be a good idea to work on one.

Unfortunately, I have way too many things with higher priorities atm.

FichteFoll avatar Jun 28 '19 00:06 FichteFoll

@FichteFoll - All titles are in my list - at 20TB, 2,500 series, my library is pretty inclusive. I have an exact record of the files in my library on AniDB (thanks to their incredible archive of individual release information, and the ability to add specific files to your library there through automated hashing), which I initially added to AniList by exporting it, importing it to MAL, then exporting that, which was then in a format that AniList could import.

I spent several days thereafter adding any series missing as a result of AniDB's series entries not corresponding one-to-one to MAL/AniList, and have manually added any new series to my list since (using AniChart at the beginning of each season for new series, and the main AniList site for the rest).

I feel a little sad at your suggestion for building a testing suite, given that most of my posts in this thread refer directly to the results from the testing suite I built, linking to it, and mentioning new builds, as, unless I'm just completely misunderstanding your meaning, that would mean you hadn't actually read anything I wrote.

The testing system, anime_extractor_test, uses an excel spreadsheet as an input for filenames to test (with no limitation as to the number).

It runs them through the '_processFilename' function of Trackma's AnimeInfoExtractor, recording the Class attributes at every stage of the function, and saving them to one sheet per file in an Excel spreadsheet, to allow for easy troubleshooting.

The final data output from the function for each file is also saved to a separate summary spreadsheet, to allow for an easy overview of the extracted data, and make it easier to identify file formatting that results in misidentification, so that it can then be looked up in the step-by-step output in the other spreadsheet, and the method that resulted in the misidentification can easily be identified.

It works quite well, and allowed me to locate and fix various parts of Trackma's AnimeInfoExtractor that were resulting in misidentification (as well as implement the filter for NCOP/NCED/Trailer files you requested).

The problem I ran into is that, despite the modified AnimeInfoExtractor now generating the correct information for every filename I've tested (which has been hundreds, including all the ones you mentioned in your posts, and many using formatting specifically designed to confuse it/trigger errors), when I include it in my Trackma build, the files are still misidentified.

I'm wondering if there is a secondary, separate part of Trackma that also extracts information from the filename, and if the AnimeInfoExtractor is only used in very specific intended instances (the fact that it is in the "extras" folder is now making me worry, hopefully it is not a relic from an older implementation kept around solely out of nostalgia or something).

@z411, I really hate to bother you, since I know how busy you must be keeping this whole thing running while addressing all the users' concerns (and hopefully, also leading a happy, fulfilling life), but if you could shed any insight on this, I can't begin to say how much I would appreciate it. _/\○_

I've been working on this on and off for quite a while now, but with everything working in testing, but seeing no change when actually running Trackma, I feel somewhat lost as to where to go from here. The modified version of AnimeInfoExtractor.py can be found in my fork, here.

Again, I'm really sorry for bothering you with this, I just feel like I've run into a wall, and I do think the improved identification accuracy could really benefit Trackma. Thanks in advance for your time!

purposelycryptic avatar Jun 28 '19 16:06 purposelycryptic

I had taken a brief look at your changes, but decided that reviewing them in relation to the current code would be quite involved because of your unusual workflow and approach.

Regarding the tests, I meant writing actual unit tests that can be executed by a test runner and CI to ensure the known test cases will never break.

FichteFoll avatar Jun 29 '19 01:06 FichteFoll

@FichteFoll - suit yourself: GitHub does have the capability to compare forks, from which you should be able to tell fairly quickly that the changes involve:

  • One additional function, whose sole purpose is to catch OPs/EDs/PVs and Specials and modify the filename variable so that they won't be mistakenly recognized as regular episodes (since Trackma assigns episode number 1 to any file it can't find an episode number for), which consists of two simple regex tests and two assignment statements that change the filename and name strings in case either test returns true.
  • One modified regex test that has been changed so that a) naming formats using the sXXeYY numbering system no longer have issues with the season being recognized as the episode number, b) sequel series that simply have a number added to the original title (i.e., 'One Punch Man 2') no longer have that number recognized as the episode number, and instead as part of the series name, and c) series with the year in the title in brackets (i.e., 'Fruits Basket (2019)') as a distinguishing naming feature would no longer have it ignored for recognition purposes.
  • One slightly modified function that ensures that the aforementioned year in brackets is restored after the function wipes out all parts of the filename string in any form of brackets for recognition purposes.
  • Naming the capture groups in a few of the other existing regex tests to make them easier to read and interpret, because figuring out their exact purpose was somewhat of a significant pain, and having the capture groups named makes regex significantly more legible - this is not yet complete, as it is primarily a cosmetic change to help make the code more readable should someone new need to read and understand it.
  • A decent number of comment lines explaining much of what I just wrote above.

I really only made fairly minimal changes, it shouldn't really take more than a few minutes to look over (maybe a little longer if you're not a regex person). It's only 80 new or modified lines, many of which are either comments or only counted because of blank space having been moved around, and the majority of the rest being adding names to capture groups, which is entirely cosmetic.

I don't really think my workflow or approach is that unusual - I had four objectives:

  1. Filtering out OPs/EDs/PVs and Specials that aren't represented in AniList in that manner, as you had requested.
  2. Fixing the issue of season numbers, numbers included in series names, and numbers included in episode titles being identified as the episode number, as requested by you and others.
  3. Allowing for a broader range of naming conventions to be properly interpreted.
  4. Allowing series that include a year in brackets as part of their name to have that year included as part of the series name for identification (This was my personal stake in the matter).

I made specific edits to achieve each objective, added comments for each, and added some additional comments when I felt the code was unclear. Since regex is heavily used, I decided to start naming capture groups to make it more legible.

And... that is pretty much it.

I spent a lot more time putting together the script that tests how filenames would be processed and outputs the results into Excel documents, mainly because I had never used python to work with spreadsheets before. Most of the rest of the time was spent tuning the regex based on the series and episode number identification results from a test list of a few thousand filenames taken from various release groups, randomly generated ones using various recommended naming standards from media center platforms like Emby, Plex and Kodi, some formats found mentioned in issues such as this one, series names and naming formats chosen specifically to break identification, and randomly generated combinations of parts of the various naming formats to simulate structures I couldn't predict. The series names used were a mix of specific examples found and randomly selected ones from a list of several thousand series names taken on AniList and AniDB.

It was torture-tested pretty well - I was hoping that, since this was in part an effort to solve your issue, you might be willing to try simply running Trackma with the modified AnimeInfoExtractor.py and doing a full library scan, to see if you encountered any issues, since you might have files using naming formats my test bank didn't include (The file naming structures in the examples you posted were pretty seriously random, and I would not have thought of ever naming files like some of those examples). But as you seem to have lost all interest in this issue since posting it, I'll simply have to finish it off myself. You've stated pretty clearly after all that you "have way too many things with higher priorities atm.", which is honestly a somewhat irritating/extremely arrogant statement to hear from the person who created the issue in the first place, and whom I started working on the issue to try to help.

Sorry if that came across as a bit uncordial, but I am currently rather seriously ill, and have been working on this in what little free time I have while waiting for the results from my last biopsy to tell me if I have lung cancer, so getting a curt response from the person who asked for help, saying they have more important things to do, and can't afford to take a few minutes to look at what has amounted to a surprisingly large amount of work done in no small part on their behalf, just to see if it might spark an idea, or at least bother to feign some minimal degree of civility, well, kind of seriously pisses me off.

Other English speakers might dismiss the rude tone as a language issue, seeing as you're located in Germany, but, being German myself, it's pretty easy to distinguish issues with the English language from intentional aggressive rudeness.

Well, it seems I've wasted enough of both of our time - I'll be discontinuing my efforts on this in favor of, well, anything else; I don't know what your "too many things with higher priorities" include, but I'd advise making some room for learning basic decency in human interaction. You sorely need it.

purposelycryptic avatar Jun 29 '19 04:06 purposelycryptic

Sorry, but I won't comment on this anymore until I find the time to properly look into what you've done. All of my comments so far have been made on mobile and probably without enough Intel. I never meant to discredit your efforts.

FichteFoll avatar Jun 29 '19 11:06 FichteFoll

Here are another two cases I just can't get recognized, not even when adding altnames (with the current version, not the one proposed by purpselycryptic):

Arifureta E01v1 [1080p+][AAC][JapDub][GerSub][Web-DL].mkv
[HorribleSubs] Nakanohito Genome [Jikkyouchuu] - 01 [1080p].mkv

FichteFoll avatar Jul 13 '19 22:07 FichteFoll

Not sure if this is the right issue to comment in, fairly new to actually engaging on GitHub, so my apologies if this is incorrect. I've had many problems with Trackma scanning my library, I use Sonarr.

I've done a bit of testing with one show, Trackma is recognising season subfolders as episode numbers instead, after testing I have:

.../Clannad After Story/01/Clannad - S02E01 - A Farewell to the End of Summer SDTV.mkv .../Clannad After Story/Season 02/[episodes 2-22 with same naming scheme]

In Trackma only the first 2 episodes are recognised:

Screenshot_20200219_225708

If I play Episode 1 it plays the file in .../Clannad After Story/01 and if I play Episode 2 it plays the last file in .../Clannad After Story/Season 02

Sonarr saves it by default as .../Clannad After Story/Season 02/[Episodes]

VitruvianCyborg avatar Feb 19 '20 23:02 VitruvianCyborg

I haven't reported other cases I've been seeing lately, but there are problems with the "SxEy" notation seen in some circles and software, e.g.

ReZERO -Starting Life in Another World- S02E06 [1080p][E-AC3][JapDub][GerEngSub][Web-DL].mkv

Currently, trackma interprets this as the second episode of the first season.

FichteFoll avatar Nov 30 '20 00:11 FichteFoll

I am also facing this issue. is there something wrong in my files name

[D] Engine: Not a show, skipping: /external/My Files/Anime/Uzaki-chan wa Asobitai!/Uzaki-chan wa Asobitai! Ep 2.mkv

[D] Engine: Not a show, skipping: /external/My Files/Anime/Uzaki-chan wa Asobitai!/Uzaki-chan wa Asobitai! Ep 3.mkv

and every time I run trackma take too much time for scaning local library

and I also have this issue Screenshot from 2021-03-26 19-40-39

uali6981 avatar Mar 26 '21 14:03 uali6981

That looks like you either have a misconfigured altname or some indices are being confused. Does ls show a name in square brackets after "Demi-chan wa Kataritai"?

Regardless, those should already be parsed correctly and definitely are in #547 (I added a test case to verify).

FichteFoll avatar Mar 26 '21 16:03 FichteFoll

I did not give any altname for this anime and ls show this Screenshot from 2021-03-26 23-09-57

and for parsed after updating trackma to pull request, give me following error Screenshot from 2021-03-26 22-59-17

and location for searchDir is /external/My Files/Anime/

I have too many anime series folders in this directory, if I put the anime folder which have videos then work fine but to put every folder in this, this is too much work

for example if I put searchDir to /external/My Files/Anime/Deca-Dence/ then work fine and did not give me parsing error

uali6981 avatar Mar 26 '21 18:03 uali6981

Since https://github.com/z411/trackma/pull/547 has been merged, addressing most of the recognition problems noted in this issue, and https://github.com/z411/trackma/pull/663 is also open as an alternative title parser, I'm closing this issue as done. For any remaining recognition issues, please open a new issue.

@uali6981 your case looks primarily as if you had the option to also take the folder name into account enabled. I suggest disabling that if you didn't already.

FichteFoll avatar Apr 18 '23 22:04 FichteFoll