audiobookshelf icon indicating copy to clipboard operation
audiobookshelf copied to clipboard

[Bug]: Too heavy on the I/O during library scan

Open Inrego opened this issue 2 years ago • 14 comments

Describe the issue

When scanning the library, it's way too heavy on the I/O. My audiobooks are hosted on Google Drive, mounted with RClone. When scanning a library folder with just about 10 books in it, the whole server completely freezes up. I can't even reboot the server properly, I have to power cycle it.

Audiobookshelf is running in a Docker container, and since it happens during library scan, I can only imagine it's related to IO.

I've been running this same setup for a few years without issues. Only Audiobookshelf is causing this kind of issue. Other similar services I'm running in the same setup, which doesn't run into this problem:

  • Plex
  • Emby
  • Jellyfin
  • Radarr
  • Sonarr

Steps to reproduce the issue

  1. Mount GDrive with rclone
  2. Start audiobookshelf docker container with GDrive folder mounted.
  3. Scan library

Audiobookshelf version

v1.7.2

How are you running audiobookshelf?

Docker

Inrego avatar Apr 08 '22 14:04 Inrego

If I add just a few books to the mounted folder at a time, it can scan it. However, it still freezes up for a little while (maybe around 30 seconds per book).

Inrego avatar Apr 08 '22 14:04 Inrego

This was improved a bit in v2 but still needs work

advplyr avatar Apr 24 '22 16:04 advplyr

Problem is you are running ffprobe for all books you find. Parallel. I tried to reproduce the problem and had more than 80 ffprobe processes...

lduesing avatar Jun 22 '22 11:06 lduesing

Solution: In the documentation of node-ffprobe:

Additionnally, you can set ffprobe.SYNC to true if you want for a particular reason to launch ffprobe synchronously (for example when used in batch processing of files to avoid too many spawns at once.)

lduesing avatar Jun 22 '22 11:06 lduesing

That's great. I was about to suggest using a semaphore with a setting to control max number of processes. But if it's already handled by ffprobe by a simple parameter, I guess that's the easier fix!

Inrego avatar Jun 22 '22 11:06 Inrego

I don't think we want to run them synchronously. The heavy I/O is because of many ffprobes running at once which I improved a bit on v2, but we are only using a single thread. We could further reduce the number of ffprobes running asynchronously and also split them into multiple threads.

advplyr avatar Jun 22 '22 12:06 advplyr

Sorry, each ffprobe process uses round about 120 MiB of virtual ram. In a directory with 80 audiobooks on a raspberry in docker your image gets killed. Scanning is something that will not happen every day, so I do not see why serialized scanning will be a problem.

lduesing avatar Jun 22 '22 12:06 lduesing

I don't think the image being killed with 80 audiobooks is a common issue. This is highly dependent on your specs of course and one of the biggest factors in reduced performance of the scanner is using a remote file system.

We have users with 10k+ audiobook libraries where scans can take hours. If we utilize all the cores of your processor instead of just one then the scan time can be a fraction of what it is now.

Currently how it works is it splits up all the audio files that need to be scanned into batches of at most 2.5GB. So if you have a 1GB audio file and 3 500MB audio files that would make up a single batch. Those batches are run synchronously where each batch will execute ffprobe on each audio file asynchronously on a single thread.

How much RAM is used on an ffprobe would be highly dependent on the size of the audio file which is why I chose to split up the batches by file size.

My proposal to increase performance would be to run the ffprobe commands in parallel on X threads where X would be the number of processor cores. 4-core processor would start 4 threads where each thread is executing a single ffprobe. I think I'm actually agreeing with you but just proposing we spread the workload across the processor.

I'm a fan of projects that are highly customizable as long as it doesn't look like a jumbled up mess in the UI, so I think having a setting to adjust the variable X could be a nice addition.

advplyr avatar Jun 22 '22 13:06 advplyr

I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably.

advplyr avatar Jun 22 '22 13:06 advplyr

I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably.

All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time"

hobesman avatar Jun 22 '22 13:06 hobesman

I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably.

All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time"

I've been doing so much node.js the last few years that I didn't realize how confusing it would sound at first. Node.js was built to run on a single thread and so "asynchronous" is really saying "handle this workload as optimally as you can, I don't care in which order you do it".

advplyr avatar Jun 22 '22 13:06 advplyr

I remember encountering this for the first time with Core for SmartThings, the home automation platform. The user can create complex conditions with if and if not and xor and so on for controlling smart devices. I remember thinking "who came up with this for synchronous commands to mean sequential and asynchronous means it can run in parallel?" It was very counterintuitive for the uninitiated user. Even if that's the term ultimately used in the underlying code, I agree the discussion and/or interface should probably avoid those terms.


From: advplyr @.> Sent: Wednesday, June 22, 2022 6:27:47 AM To: advplyr/audiobookshelf @.> Cc: hobesman @.>; Comment @.> Subject: Re: [advplyr/audiobookshelf] [Bug]: Too heavy on the I/O during library scan (Issue #444)

I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably.

All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time"

I've been doing so much node.js the last few years that I didn't realize how confusing it would sound at first. Node.js was built to run on a single thread and so "asynchronous" is really saying "handle this workload as optimally as you can, I don't care in which order you do it".

— Reply to this email directly, view it on GitHubhttps://github.com/advplyr/audiobookshelf/issues/444#issuecomment-1163097986, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHCYJJNSFXNMSZSO6XCEMLDVQMIFHANCNFSM5S4YRTRA. You are receiving this because you commented.Message ID: @.***>

hobesman avatar Jun 22 '22 13:06 hobesman

You can summon workers with NodeJS. These will run on a different thread but can communicate with the master

https://nodejs.org/api/worker_threads.html

rasmuslos avatar Jun 23 '22 21:06 rasmuslos

You can summon workers with NodeJS. These will run on a different thread but can communicate with the master

https://nodejs.org/api/worker_threads.html

We use worker threads for making M4b files already. https://github.com/advplyr/audiobookshelf/blob/master/server/managers/AbMergeManager.js#L186

We just need to re-build the scanner to use them.

advplyr avatar Jun 23 '22 21:06 advplyr

Hi, I have the same issue with audiobookshelf on my Synology NAS. To many ffprobes: image image

max. count parameter would be great. thanks

cnu80 avatar Jan 12 '23 18:01 cnu80

I missed this issue but this was fixed a few versions ago. Basically we are capping the number of ffprobe processes.

advplyr avatar Mar 29 '23 19:03 advplyr