Kavita icon indicating copy to clipboard operation
Kavita copied to clipboard

Filename keywords for comic specials don't work properly

Open pssandhu opened this issue 2 years ago • 8 comments

Describe the bug This page of the wiki lists special keywords that can be used in filenames to get kavita to mark them as specials.

To Reproduce Steps to reproduce the behavior:

  1. Have a library with this folder structure (the files have no metadata in them):
Library Root/
    |-- Batman Beyond (1999)/
        |-- Batman Beyond 01.cbz
        |-- Batman Beyond ...
        |-- Batman Beyond 06.cbz
        |-- Batman Beyond Annual 01.cbz
        |-- Batman Beyond Annual 02.cbz
  1. Scan the library
  2. The issues show up in kavita correctly but the annuals are not there

Expected behavior The annuals should be in the library and marked as specials

Desktop (please complete the following information):

  • OS: Docker
  • Browser: Firefox
  • Version: Kavita 0.6.0.0

Additional context On a fresh install the annuals showed up as a separate series called Batman Beyond Annual but I'm unable to reproduce that after renaming the files. Might need to do a fresh install to see this.

I tested various filenames one at a time (shown below). Some of these are unlikely to be real filenames like Batman Beyond Book.cbz or Batman Beyond Annual.cbz but I thought testing these might be useful.

Treated as a special:

01 Annual Batman Beyond.cbz
Batman Beyond Omnibus.cbz
Batman Beyond Omnibus (1999).cbz
Batman Beyond TPB.cbz
Batman Beyond Bonus.cbz
Batman Beyond Specials.cbz
Batman Beyond OneShot.cbz

Treated as a volume:

Batman Beyond TPB v01.cbz

Not added to library:

Batman Beyond Annual 01 (1999).cbz
Batman Beyond Annual 01.cbz
Batman Beyond Annual #001.cbz
Batman Beyond Annual (1999).cbz
Batman Beyond Annual.cbz
1 Annual Batman Beyond.cbz
Batman Beyond Omnibus 01.cbz
Batman Beyond Omnibus 1.cbz
Batman Beyond TPB 1.cbz
Batman Beyond Book 01.cbz
Batman Beyond Book.cbz

pssandhu avatar Oct 28 '22 18:10 pssandhu

Hi @tjarls, this issue looks to be from https://github.com/Kareadita/Kavita/pull/1531 where specials were word bounded to prevent false positives, however from testing, I'm seeing "Batman Beyond Annual" and "Ippo - Artbook" to not be considered a special.

From the regex: |\d.+?\WAnnual|Annual\W\d.+?|, you'd think it'd work, but it does not match. Any ideas?

majora2007 avatar Nov 05 '22 12:11 majora2007

It seems the \d is what is our culprit, so what if we just left \d out and changed to .\WAnnual, which would still act very similar.

majora2007 avatar Nov 05 '22 12:11 majora2007

I settled for \b(?:\d.+?(\W|-|^)Annual|Annual(\W|-|$))\b (just looking at Annual). This allows us to meet all the Unit tests and works against this case as well.

majora2007 avatar Nov 05 '22 12:11 majora2007

That behaviour predates #1531 and has nothing to do with word bounding the key words used for special. I suspect the intention was indeed to only match Annual alongside a year or just a number. Comic annuals almost always have a date or at least a number alonside the world "annual" (for example Action Comics 2021 Annual, Amazing Spider-Man Annual 2). Here's the original regex from before the word bounding changes:

@"(?<Special>Specials?|OneShot|One\-Shot|\d.+?(\W|_|-)Annual|Annual(\W|_|-)\d.+?|Extra(?:(\sChapter)?[^\S])|Book \d.+?|Compendium \d.+?|Omnibus \d.+?|[_\s\-]TPB[_\s\-]
|FCBD \d.+?|Absolute \d.+?|Preview \d.+?|Art Collection|Side(\s|_)Stories|Bonus|Hors Série|(\W|_|-)HS(\W|_|-)|(\W|_|-)THS(\W|_|-))",

The unit test for Annual Days of Summer is another evidence that this was indeed legacy behaviour. So there hasn't been any regression on that side.

On the other hand, the widening of the match for Annual and, even more so, for Absolute, is introducing quite a few additional false positives. Some examples I have spotted in my own library

  • Superman/Batman: Absolute Power
  • Blue Monday: Absolute beginners
  • A few Marvel series with "Absolute Carnage"

While there are mechanisms to tag as special individual issues, there isn't a way of doing the opposite short of actually changing the series names on those affected.

so I do not think it's a good idea to so widden the filename special matching mechanism for Absolute and to a lesser extend Annual without year. There are plenty of other words that are occasionally used to denote a special edition of a series (Definitive, Collected, Album, Digest,..) that are not automatically detected and covering all of them is both non realistic and unnecessary as the explict tagging with SP# for example or setting the comicinfo "special" tag already covers those case. We do not have a "not-special" tag for the reverse use case. So the current fix looks to me as a regression where we are trading minor annoyances that are already easily solvable for an issue without a solution and often a way worse user experience (all issues from a series suddenly tagged as special and no longer properly sorted).

Finally if we decide to keep the current fix, the regular expression is unnecessary complex. To achieve, the desired goal of matching Annual, Absolute, etc the more efficient regex would be simply:

$@"\b(?:{CommonSpecial}|Annual|Book \d.+?|Compendium|Omnibus|FCBD \d.+?|Absolute|Preview|Hors[ -]S[ée]rie|TPB|HS|THS)\b"

tjarls avatar Nov 05 '22 21:11 tjarls

Hmmm...you make some good points and it's good to also learn more about how some keywords are used in comics, as I do not collect them myself. As we can support via SP# or ComicInfo Special tag, it does make sense to reduce false positives rather than support more loose rules.

Do you suggest removing Absolute and Annual (without year) altogether? I need guidance on comic support. I'm also not sure how often these keywords are used.

majora2007 avatar Nov 06 '22 02:11 majora2007

First comment! I just have found this project and I'm still reading everything about it. It looks really cool.

Why not to require separators? So the word "Annual" would only be recognized in this case if there's - before it? #, -, (, ), [ and ] could be used as separators.

Too bad idea?

robsonsobral avatar Nov 15 '22 17:11 robsonsobral

This occurred for me as well. As a test, I created a file Amazing Spider-Man issue 001 as well as Amazing Spider-Man Annual 001. They're lumped together as the same issue in Kavita.

image

"OneShot" followed by a number also results in the special being listed as a duplicate of an issue in the library view, so I think the keywords are simply ignored if there is an issue number.

"Annual" or "OneShot'" by itself is recognized and placed under the "Special" pane in the library view.

Finally, putting the number in parentheses, like 'Annual (001)', will separate out the issues appropriately into the Specials pane - without a number, probably since Kavita ignores parentheses, but the number is in the displayed name, so that seems like a decent workaround for now.

zzyzx-dc avatar Mar 24 '23 16:03 zzyzx-dc

I know this is an old issue, but I'm now considering removing special keyword parsing. Would love feedback on the other issue: https://github.com/Kareadita/Kavita/issues/2967

majora2007 avatar Jun 05 '24 21:06 majora2007