recognize icon indicating copy to clipboard operation
recognize copied to clipboard

Multiple recognize processes use up more than the maximum amount of cores set

Open Ellpeck opened this issue 3 years ago • 1 comments

Describe the bug In the Recognize settings, I set the maximum amount of cores to use for Recognize to 1. This works fine for each individual Recognize process. However, an issue arises when multiple types of recognition are set in the config, at which point multiple Recognize processes seem to start and run simultaneously, causing more than 1 CPU core to be used in total.

image

brave_6PyXhMeApZ

I would personally consider this a bug, but it might also boil down to a feature request: I think there should be an option to specify

  • how many cores should be used in total or
  • whether multiple Recognize processes are allowed to run simultaneously.

Thanks so much for considering this report, as well as for the app in general!

To Reproduce Steps to reproduce the behavior:

  1. Enable multiple types of recognition
  2. Set core limit to 1
  3. Wait until recognition starts automatically and check your CPU usage

Expected behavior The "maximum amount of cores" setting is respected in some way, either through all processes sharing the same core (which might be difficult to achieve), or through only one process running simultaneously if the core limit is too low.

Recognize (please complete the following information):

  • JS-only mode: No
  • Enabled modes: All

Server (please complete the following information):

  • Nextcloud: 25.0.0 RC 1
  • OS: Ubuntu 18.04
  • RAM: 16GB
  • Processor Architecture: x64

Ellpeck avatar Sep 25 '22 09:09 Ellpeck

Hey @Ellpeck

That is a nice catch that we hadn't thought of yet. Nextcloud core apparently wants to run cron.php 5 minutes, but keeps them alive for 15minutes...

marcelklehr avatar Sep 26 '22 15:09 marcelklehr

Hi! Any update on this? Unfortunately, this issue is making me unable to use Recognize without disabling every recognition type but one, because otherwise it very quickly overloads my server :(

If there's any more details, logs or other info I can provide, do let me know!

Ellpeck avatar Oct 12 '22 10:10 Ellpeck

Yes, I've checked with the server repository, but changing the cron.php frequency doesn't seem to be an option, so I've changed the frequency of the classifier runs within recognize and set them to time-insensitive, so you can set a 4h window in the night, when recognize should run.

marcelklehr avatar Oct 12 '22 11:10 marcelklehr

Unfortunately, I'm finding that this issue is still happening, even when using version 3.0.1 of the app.

I also tried to route the node command through a script that sets its niceness to 19, causing it to have the lowest possible priority in terms of CPU scheduling, but the multiple processes still cause the server to struggle immensely.

It seems like most of the processes are actually the movinet classifiers, which seem to run for far longer than the 15 minute timespan that you said was allotted for them. However, even with those turned off, sometimes there will be multiple classification processes running, but they will have a less extreme impact on the overall performance than the movinet ones.

Here's another screenshot for reference. It should also be noted that the first process has been running for more than 15 minutes, and seems to be using up more than 100% CPU, meaning more than one core in this case, even though I have set the core limit to 1.

image

Edit: I'm now forcing lower CPU usage for the recognize processes by using cpulimit. It works, but it's far less than ideal. It would be great if there was more of an ability to customize the timings and/or CPU usage of the recognize processes.

Please let me know if there's any additional testing I can do, or any additional information I can provide!

Ellpeck avatar Oct 13 '22 18:10 Ellpeck

I am also facing this issue and have to disable this app because of this

gymnae avatar Nov 10 '22 20:11 gymnae

It's not a real 'fix' but as a workaround you can reduce the batch size of the individual jobs now in version 3.2.0, so they don't take as long anymore.

marcelklehr avatar Nov 11 '22 15:11 marcelklehr

It's not a real 'fix' but as a workaround you can reduce the batch size of the individual jobs now in version 3.2.0, so they don't take as long anymore.

Yea, this is what I have done so that the batch queue will be shorter, to avoid multiple node runs as much as possible.

In order to minimise the cpu impact, run cron.php with chrt -i 0*. This sets the idle CPU scheduling class on Linux, which is much lower than nice -n19

3-59/5 * * * * chrt -i 0  php -f /var/www/domains/my.domain/htdocs/nextcloud/cron.php >/dev/null 2>&1

https://www.man7.org/linux/man-pages/man1/chrt.1.html

Forza-tng avatar Nov 27 '22 12:11 Forza-tng

It's not a real 'fix' but as a workaround you can reduce the batch size of the individual jobs now in version 3.2.0, so they don't take as long anymore.

Is it possible to add an internal timer/counter in the classifyer cron code, so that it dynamically runs as many images as possible during 5 minutes? Then we don't have to worry about multiple jobs happening or adjusting the batch size manually.

Forza-tng avatar Dec 14 '22 15:12 Forza-tng

Is it possible to add an internal timer/counter in the classifyer cron code, so that it dynamically runs as many images as possible during 5 minutes? Then we don't have to worry about multiple jobs happening or adjusting the batch size manually.

That's a nice idea!

marcelklehr avatar Dec 30 '22 16:12 marcelklehr

That's a nice idea!

Side note: That's how the Nextcloud FaceRecognition app does it.

Also, it requires to set up a separate cron job just for face recognition, so the regular Nextcloud cron jobs are not affected / impacted.

This obviously requires additional configuration server side, though.

gohrner avatar Dec 30 '22 22:12 gohrner

dynamically runs as many images as possible during 5 minutes

This is now released (at least) since v3.4.0. Let me know if this fixes things. In v3.5.0 (out today) we've replaced the clustering algorithm which may also improve the situation described in this issue.

marcelklehr avatar Feb 09 '23 13:02 marcelklehr

hello, thank you, I reactivated face recognition on 1 processor, and it seems to work.
I was wondering : does the cron job run every 5 min even if I have the config.php with "'maintenance_window_start' => 1" ??
The admin panel of Recognize says "last classification 11 hour ago" (today at 17:36) I think it is strange whereas my cronjob is : */5 * * * * php -f /var/www/nextcloud/cron.php

  • 4 * * * php -f /var/www/nextcloud/occ preview:pre-generate

potagerGit avatar Feb 11 '23 16:02 potagerGit

If you have */5 cron.php will run every 5 minutes. as it should. if you have maintenance_window_start' => 1 cron.php will only run recognize jobs during the 4h hour window you configured.

marcelklehr avatar Feb 11 '23 17:02 marcelklehr

Did anything change ? Strangely, it seems to come back to all CPU usage, whereas only 1 CPU usage is configured ? I run v3.5.0, but not sure if I upgraded or note since my "it works" post 2 weeks ago... not sure what command to run to be sure it runs on only 1 CPU, but here are Htop screenshots :

image

recognize CPU

potagerGit avatar Feb 23 '23 20:02 potagerGit

Hi yes I woke up yesterday with 3.6.2 to find out the entire set of CPU cores was used although I had limited to 3 (of 4) I have now set to 1 only CPU ?

john-2000 avatar Mar 11 '23 10:03 john-2000

Yes, I can confirm, yesterday and today recognize rendered my server basically unusable until I killed all of the running node processes...

image

This way, the "system requirement min. 4 GB RAM" does not really hold...

However, I'm not sure if this closed bug report is still being monitored...

gohrner avatar Mar 11 '23 21:03 gohrner

I added bug #729 regarding the CPU limiter still not working.

gohrner avatar Mar 11 '23 21:03 gohrner

thanks for reopening this. Just for my linux-for-dummies knowledge, what command or utility do you use to check the number of cores a particular process use ? Is this somewhere within htop ?

potagerGit avatar Mar 13 '23 21:03 potagerGit