eye icon indicating copy to clipboard operation
eye copied to clipboard

Crashing with hundreds of processes

Open rgaufman opened this issue 4 years ago • 13 comments

I'm trying to monitor around 400 processes, eye jumps to 100% CPU on 1 core and then threads start crashing:

2020-03-08 20:19:12.603496 E [91602:70218487656520 logger.rb:53] eye -- [celluloid] thread crashed
Celluloid::TaskTerminated: task was terminated
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:35:in `terminate'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:323:in `block in cleanup'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:321:in `each'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:321:in `cleanup'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:307:in `shutdown'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:169:in `run'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:131:in `block in start'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-essentials-0.20.5/lib/celluloid/internals/thread_handle.rb:14:in `block in initialize'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor/system.rb:78:in `block in get_thread'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/group/spawner.rb:50:in `block in instantiate'
	(celluloid):0:in `remote procedure call'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/call/sync.rb:45:in `value'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/proxy/sync.rb:22:in `method_missing'
	/data/deployer/timeagent/eye/helpers.rb:195:in `process_started?'
	/data/deployer/timeagent/eye/helpers.rb:205:in `block in wait_for_process'
	/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/trigger.rb:103:in `instance_exec'
	/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/trigger.rb:103:in `exec_proc'
	/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/trigger/starting_guard.rb:28:in `block in check_start'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/calls.rb:28:in `public_send'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/calls.rb:28:in `dispatch'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/call/sync.rb:16:in `dispatch'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/future.rb:18:in `block in new'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-essentials-0.20.5/lib/celluloid/internals/thread_handle.rb:14:in `block in initialize'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor/system.rb:78:in `block in get_thread'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/group/spawner.rb:50:in `block in instantiate'
2020-03-08 20:19:12.603970 E [91602:70216306749380 logger.rb:53] eye -- [celluloid] thread crashed

I'm also seeing errors like this:

2020-03-08 10:25:15.579624 E [42676:70134578180800 logger.rb:53] eye -- [celluloid] Actor crashed!
Celluloid::DeadActorError: attempted to call a dead actor: proc_cpu
        /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/proxy/sync.rb:9:in `method_missing'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/system_resources.rb:26:in `start_time'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/system.rb:44:in `compare_identity'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/monitor.rb:84:in `check_identity'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:16:in `block in add_watchers'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:48:in `block in add_watcher'
        /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:339:in `block in task'
        /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task.rb:44:in `block in initialize'
        /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:14:in `block in create'

and:

2020-03-08 10:25:15.601780 E [42676:70134578814120 logger.rb:53] eye -- [celluloid] Actor crashed!
Celluloid::DeadActorError: attempted to call a dead actor: proc_cpu
        /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/proxy/sync.rb:9:in `method_missing'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/system_resources.rb:26:in `start_time'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/system.rb:44:in `compare_identity'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/monitor.rb:84:in `check_identity'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:16:in `block in add_watchers'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:48:in `block in add_watcher'
        /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:339:in `block in task'
        /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task.rb:44:in `block in initialize'
        /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:14:in `block in create'

This is an AMD EPYC 7451 24-Core CPU, so it can handle it a lot more, with all processes running it's only 10% loaded. Are there any parameters I can tune in eye to handle this many processes?

rgaufman avatar Mar 08 '20 20:03 rgaufman

i not tested so much processes, but ~100 was ok. Ruby uses 1 core for it concurrency, so 100% cpu usage possible. But still should not. Error repeated if restart eye? As workaround, you can try to run multiple eyes with group of processes in different folders (local eye - leye).

kostya avatar Mar 08 '20 21:03 kostya

It's just constantly hovering with 100% and not responding to bundle exec eye i -- I can't really try multiple processes without re-architecturing how the app works.

Is there something, somewhere I can adjust to reduce how many times each process is checked or something? - anything else you can think of to help with reducing load with many many processes?

I temporarily reduced the number of processes to 300 and now it's taking 17 to 30% CPU - it seems it hits some kind of threshold and then everything stops working.

rgaufman avatar Mar 08 '20 21:03 rgaufman

To minimize load, you can remove cpu, memory checks. Also disable identity check check_identity: false, increase check_alive prediod: check_alive_period: 30.seconds

Also, you can try increase expire in cache of getting cpu, memory info from OS:

Eye::SystemResources.cache.setup_expire(30)

kostya avatar Mar 08 '20 22:03 kostya

I tried all of those things and still getting this:

2020-03-11 23:40:36.466493 E [79975:70006573067020 logger.rb:53] eye -- [recorder_5e65f1c2a79dfa42c5f0b87b:5e65f1c2a79dfa42c5f0b87b:live] check:cpu(<100%) Exception: attempted to call a dead actor: proc_cpu ["/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/proxy/sync.rb:9:in `method_missing'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/system_resources.rb:15:in `cpu'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker/cpu.rb:10:in `get_value'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:136:in `get_value_safe'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:108:in `check'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:66:in `watcher_tick'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:48:in `block in add_watcher'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:339:in `block in task'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task.rb:44:in `block in initialize'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:14:in `block in create'"]
2020-03-11 23:40:36.545344 E [79975:70006572737640 logger.rb:53] eye -- [recorder_5e65f1cfa79dfa42c5f0b8aa:5e65f1cfa79dfa42c5f0b8aa:proxy] check:memory(<300Mb) Exception: undefined method `proc_mem' for nil:NilClass ["/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/system_resources.rb:9:in `memory'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker/memory.rb:10:in `get_value'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:136:in `get_value_safe'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:108:in `check'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:66:in `watcher_tick'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:48:in `block in add_watcher'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:339:in `block in task'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task.rb:44:in `block in initialize'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:14:in `block in create'"]
2020-03-11 23:40:36.655136 E [79975:70006577492940 logger.rb:53] eye -- [recorder_5e63325716a95a181c898f6c:5e63325716a95a181c898f6c:detector] process <17218> not found, it may have crashed (you should check the process logs ["/data/deployer/timeagent/log/recorder/detector-5e63325716a95a181c898f6c.log", "/data/deployer/timeagent/log/recorder/detector-5e63325716a95a181c898f6c.log"])
2020-03-11 23:40:36.655316 E [79975:70006577492940 logger.rb:53] eye -- [recorder_5e63325716a95a181c898f6c:5e63325716a95a181c898f6c:detector] process <17218> failed to start (:not_really_running)
2020-03-11 23:40:36.664231 E [79975:70006572911200 logger.rb:53] eye -- [recorder_5e65f1c7a79dfa42c5f0b889:5e65f1c7a79dfa42c5f0b889:detector] check:cpu(<100%) Exception: undefined method `proc_cpu' for nil:NilClass ["/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/system_resources.rb:15:in `cpu'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker/cpu.rb:10:in `get_value'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:136:in `get_value_safe'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:108:in `check'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:66:in `watcher_tick'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:48:in `block in add_watcher'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:339:in `block in task'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task.rb:44:in `block in initialize'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:14:in `block in create'"]
2020-03-11 23:40:36.668949 E [79975:70006573728220 logger.rb:53] eye -- [recorder_5e633a3a4caa415b13ddccef:5e633a3a4caa415b13ddccef:proxy] check:cpu(<30%) Exception: undefined method `proc_cpu' for nil:NilClass ["/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/system_resources.rb:15:in `cpu'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker/cpu.rb:10:in `get_value'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:136:in `get_value_safe'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:108:in `check'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:66:in `watcher_tick'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:48:in `block in add_watcher'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:339:in `block in task'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task.rb:44:in `block in initialize'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:14:in `block in create'"]
2020-03-11 23:40:36.673219 E [79975:70006573105240 logger.rb:53] eye -- [recorder_5e6349775f439f29fd773901:5e6349775f439f29fd773901:recorder] check:cpu(<100%) Exception: undefined method `proc_cpu' for nil:NilClass ["/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/system_resources.rb:15:in `cpu'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker/cpu.rb:10:in `get_value'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:136:in `get_value_safe'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:108:in `check'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:66:in `watcher_tick'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:48:in `block in add_watcher'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:339:in `block in task'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task.rb:44:in `block in initialize'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:14:in `block in create'"]
2020-03-11 23:40:36.722170 E [79975:70006578352780 logger.rb:53] eye -- [celluloid] thread crashed
Celluloid::TaskTerminated: task was terminated
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:35:in `terminate'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:323:in `block in cleanup'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:321:in `each'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:321:in `cleanup'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:307:in `shutdown'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:169:in `run'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:131:in `block in start'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-essentials-0.20.5/lib/celluloid/internals/thread_handle.rb:14:in `block in initialize'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor/system.rb:78:in `block in get_thread'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/group/spawner.rb:50:in `block in instantiate'
	(celluloid):0:in `remote procedure call'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/call/sync.rb:45:in `value'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/proxy/sync.rb:22:in `method_missing'
	/data/deployer/timeagent/eye/helpers.rb:195:in `process_started?'
	/data/deployer/timeagent/eye/helpers.rb:205:in `block in wait_for_process'
	/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/trigger.rb:103:in `instance_exec'
	/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/trigger.rb:103:in `exec_proc'
	/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/trigger/starting_guard.rb:28:in `block in check_start'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/calls.rb:28:in `public_send'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/calls.rb:28:in `dispatch'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/call/sync.rb:16:in `dispatch'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/future.rb:18:in `block in new'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-essentials-0.20.5/lib/celluloid/internals/thread_handle.rb:14:in `block in initialize'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor/system.rb:78:in `block in get_thread'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/group/spawner.rb:50:in `block in instantiate'
2020-03-11 23:40:36.722753 E [79975:70006578844600 logger.rb:53] eye -- [celluloid] thread crashed
Celluloid::TaskTerminated: task was terminated

This is what's in top

Tasks: 1609 total,   8 running, 1070 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  6.1 sy, 12.3 ni, 80.6 id,  0.0 wa,  0.0 hi,  1.0 si,  0.0 st
KiB Mem : 65649456 total,  8356892 free, 15504224 used, 41788340 buff/cache
KiB Swap:  8388604 total,  8375792 free,    12812 used. 49448092 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
18519 deployer  35  15 45.117g 158928  12396 S 101.9  0.2   1:26.30 eye monitoring v0.10.1.pre [recorder_5e63325316a95a181c898f5d, recorder_5e63325416a95
13808 deployer  30  10 3291700 342060  14968 S  43.2  0.5   6:21.70 puma 4.3.1 (tcp://127.0.0.1:3001) [timeagent]
69395 deployer  35  15  171752  23540   5984 S  32.6  0.0   0:01.01 ruby2.6 /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/bin/eye xinfo
69712 deployer  35  15  171756  23584   6008 S  31.6  0.0   0:00.98 ruby2.6 /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/bin/eye i -j
42865 deployer  35  15 8825944 495948  15276 S  20.3  0.8   3:40.19 sidekiq 6.0.5 timeagent [2 of 8 busy]
92068 deployer  35  15 13.847g 228424  35692 S  19.7  0.3   0:25.49 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e633a384caa415b13ddcce6 --listen_port 10738 --brand Te+
96621 deployer  35  15 13.846g 231320  35640 S  19.4  0.4   0:24.70 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e636e34a79dfa1d11329522 --listen_port 10810 --brand Te+
96309 deployer  35  15 13.846g 228176  35408 S  18.7  0.3   0:24.56 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e636e33a79dfa1d1132951c --listen_port 10807 --brand Te+
95581 deployer  35  15 13.846g 226416  35760 S  17.7  0.3   0:27.16 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e6349755f439f29fd7738f7 --listen_port 10792 --brand Te+
91260 deployer  35  15 13.847g 230768  35684 S  17.4  0.4   0:25.22 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e633a364caa415b13ddccd9 --listen_port 10723 --brand Te+
93139 deployer  35  15 13.847g 224536  36116 S  17.4  0.3   0:24.98 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e6349715f439f29fd7738d6 --listen_port 10759 --brand Te+
96870 deployer  35  15 13.846g 230900  35404 S  15.5  0.4   0:24.11 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e636e37a79dfa1d11329529 --listen_port 10813 --brand Te+
92506 deployer  35  15 13.847g 235220  35480 S  14.8  0.4   0:23.46 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e633a3b4caa415b13ddccf6 --listen_port 10756 --brand Te+
93298 deployer  35  15 7508252 183024  35376 S  14.5  0.3   0:19.63 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e6349725f439f29fd7738df --listen_port 10765 --brand Te+
96120 deployer  35  15 13.847g 228884  35860 S  14.5  0.3   0:26.14 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e6349775f439f29fd773901 --listen_port 10801 --brand Te+
90826 deployer  35  15 13.847g 224616  35204 S  13.9  0.3   0:24.04 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e633a364caa415b13ddccd5 --listen_port 10717 --brand Te+
...

I've added

Eye::SystemResources.cache.setup_expire(30)
Eye.application "recorder_#{camera[:camera_id]}" do
   ...
  check_identity false
  check_alive_period 30.seconds
end

Any other ideas/suggestions?

rgaufman avatar Mar 11 '20 23:03 rgaufman

hard to say, looks like may be celluloid bug. undefined method 'proc_cpu' for nil:NilClass this error impossible because proc_cpu called from global class, which impossible to be nil, it created when eye start. May be there was first error in log, after which this undefined method 'proc_cpu' for nil:NilClass appears.

kostya avatar Mar 12 '20 00:03 kostya

I have been trying to play with this without success :(

I think I am going to have to come up with another strategy. Maybe to create a simple layer to systemd, where adding an application will create .service files and reload the systemd daemon. Kind of like the whenever gem does with cron.

From there, the process will just be a single loop, that queries systemd and checks CPU/Ram and performs the required systemd action. This way, each iteration will just take a bit longer depending on the number of processes, without adding additional load.

rgaufman avatar Mar 15 '20 12:03 rgaufman

Yea ruby just bad, when high concurrency, try to split processes to multiple eyes or snt else, i dont know what to fix here.

kostya avatar Mar 15 '20 13:03 kostya

Have you made progress on this by chance @kostya?

TeresaP avatar Jul 20 '21 20:07 TeresaP

No, try all advices in this thread.

kostya avatar Jul 20 '21 21:07 kostya

I ended up switching to systemd with a simple ruby script to manage the processes. It actually was relatively easy, easier than I expected. Just an erb template for the .service file and then adding/removing them as needed, doing a systemctl enable and systemctl daemon-reload and it's doing most of the monitoring eye does but taking up virtually 0% cpu vs 100%+ eye was taking. It is also easy to add/remove dependencies and systemd just handles it all for you.

The code is specific to my application, but some examples of how it works are:

def remove(service_name, service_path)
  system("sudo systemctl stop #{service_name}")
  system("sudo systemctl disable #{service_name}")
  system("sudo rm #{service_path}")
  system('sudo systemctl daemon-reload')
end

similarly for add except you need to generate it from an erb template. For checking status, you parse the output of systemctl status process-name. It is working like a dream and starting/stopping/restarting is lightening fast.

rgaufman avatar Jul 13 '22 01:07 rgaufman

@rgaufman With systemd, why use eye at all?

grimm26 avatar Jul 13 '22 03:07 grimm26

I removed eye from our application, it's fantastic for managing a smaller number of processes but seems celluloid didn't work well when it came to 100+ processes. But even with smaller number of processes, we now just script systemd as it's much more resource efficient and has some other advantages like better dependency handling.

rgaufman avatar Jul 13 '22 08:07 rgaufman

@rgaufman do you have anything to share on this on github?

grimm26 avatar Jul 13 '22 13:07 grimm26