audiobookshelf icon indicating copy to clipboard operation
audiobookshelf copied to clipboard

[Bug]: Audioboks with multiple cds are not properly added

Open mediadev123 opened this issue 3 years ago • 1 comments
trafficstars

Describe the issue

One audiobook has cd's subfolders as already described in https://github.com/advplyr/audiobookshelf/issues/393

What's special about my case is that the CDs seem to have a space in between them ("CD 1" instead of "CD1"), which causes the algorithm to not correctly label them. Grabbing through the code, it seems like the regex is all over the place and not really updated:

Here are lines where the incorrect regex is used: https://github.com/advplyr/audiobookshelf/blob/d0af1c3c9ac49ae46d490a87d7f67754111e16f6/server/utils/scandir.js#L60 https://github.com/advplyr/audiobookshelf/blob/d0af1c3c9ac49ae46d490a87d7f67754111e16f6/server/utils/scandir.js#L111

Here an updated regex is used: https://github.com/advplyr/audiobookshelf/blob/5446aea910b4d51c5320236421aa9f668b65bdc4/server/scanner/AudioFileScanner.js#L25 https://github.com/advplyr/audiobookshelf/blob/5446aea910b4d51c5320236421aa9f668b65bdc4/server/scanner/AudioFileScanner.js#L32 https://github.com/advplyr/audiobookshelf/blob/8894f5243972e10e2b400dd7c3bf338ca8867068/server/scanner/MediaFileScanner.js#L26https://github.com/advplyr/audiobookshelf/blob/8894f5243972e10e2b400dd7c3bf338ca8867068/server/scanner/MediaFileScanner.js#L33

But a few lines below another incorrect one is used again: https://github.com/advplyr/audiobookshelf/blob/5446aea910b4d51c5320236421aa9f668b65bdc4/server/scanner/AudioFileScanner.js#L38 (no check for disc) https://github.com/advplyr/audiobookshelf/blob/8894f5243972e10e2b400dd7c3bf338ca8867068/server/scanner/MediaFileScanner.js#L38-L39

I would highly suggest to create a constant for the regex and unify it to something like this:

const CDSubfolderRegexes = [/\bdis[ck]\s*(?<num>\d{1,3})\b/i, /\bcd\s*(?<num>\d{1,3})\b/i]

Those regex would catch the following formats: "CD(any space any more times)(any number, from 0 to 999)", "disk(any space any more times)(any number, from 0 to 999)", "disc(any space any more times)(any number, from 0 to 999)". By using \s instead of a space, we allow any other weird space characters (like the non-line breaking, a tab or even a new line -- you never know). This allows multiple spaces (in case someone misnames the CD). This allows the alternate spelling of disc, disk. And by using a named group, the regex is more forward compatible in case there are more groups in the future:

For example:

var discFromFolder = match(t).groups["num"]
// Instead of 
var discFromFolder = Number(pathdir.replace(/cd/i, ''))

Lastly, it would be great if this regex is configurable so that users can add their own version easily, in case their audiobook uses a regional name for disc, such as disco in Spanish (I think). This is why I used a regex array instead of a more complex regex -- they scale way better IMO.

I would like to help but my javascript is not good enough for doing a refactor.

Steps to reproduce the issue

  1. Use any audiobook with a space in between the "CD " and number

Audiobookshelf version

2.0.24_amd64.deb

How are you running audiobookshelf?

Debian/PPA

mediadev123 avatar Jul 15 '22 23:07 mediadev123

Can you share what your exact file path was that was not properly detected? Spaces are supported already when parsing the CD from the audio file name.

There is some confusion in your post about what the various regexes are doing for CD. One thing to note is the AudioFileScanner was removed and replaced with the MediaFileScanner but the code parsing the CD is the same.

The MediaFileScanner will attempt to parse out a track and/or cd number from the filename. The only purpose of this is to properly order the tracks so for example Track 10 CD 1 would go before Track 1 CD 2. Example audio filenames supported audiobook CD01.mp3 audiobook CD 01.mp3 audiobook Disc 1.mp3 audiobook Disc10.mp3

In order to support another file structure sometimes used, I added support for detecting CD subfolders. For example /Animal Farm/CD01/track1.mp3 /Animal Farm/CD2/track1.mp3

This is more restrictive in that it doesn't allow a space but that could be updated. I don't think this is what you are asking for though.

I would like to add support for customizing the scanner folder structure but there is a lot of planning that has to go into that since there are so many edge cases. It has been brought up a few times and I like the suggestion to use grok-style syntax for defining path patterns #774

advplyr avatar Jul 23 '22 18:07 advplyr

Hello, sorry for taking so long to respond. I must have missed your notification. The complete paths are: /mnt/media/audiobooks/AuthorFirstName AuthorLastName Collection/AuthorLastName, AuthorFirstName - AudiobookTitle/CD {1..13}/01 - Track 1.mp3

You have already pointed out the difference between MediaFileScanner and AudioFileScanner, but what exactly does scandir.js do then? Anyhow, I still think the regex should be more flexible while you work on #774 as it seems to be more complex.

Thanks for your detailed answers, though!

mediadev123 avatar Aug 22 '22 19:08 mediadev123