exmagick icon indicating copy to clipboard operation
exmagick copied to clipboard

Memory Leak

Open weixiyen opened this issue 7 years ago • 9 comments

I'm processing about 2000 images per minute across 8 nodes in production and noticed that system memory is going up.

:erlang.memory() is stable, but system memory goes up by about ~50mb per hour per node until the VM just crashes.

It seems that the NIF bindings from exmagick might be the issue here where memory is not being freed up somewhere.

The commands I'm using are init! image_load!, image_dump!, and size!

0.0.5

weixiyen avatar Apr 07 '18 22:04 weixiyen

looking at the code my guess is image_dump!. enif_make_resource_binary requires enif_release_resource which is missing. i'll make a test to verify and if that is the case i'll make a patch fixing.

thanks!

dgvncsz0f avatar Apr 10 '18 10:04 dgvncsz0f

@weixiyen could you test #10 and let me know if it fixes your problem?

dgvncsz0f avatar Apr 10 '18 22:04 dgvncsz0f

I added the config to my deps, works fine locally.

{:exmagick, git: "https://github.com/Xerpa/exmagick.git", ref: "b9162a1bd0908d0e3d5d501a07764c0e2936eaa2"},

In prod getting this crash on startup for this commit hash whereas hex.pm's version 0.0.5 works fine.

2018-04-11 03:36:29 crash_report
    initial_call: {supervisor,kernel,['Argument__1']}
    pid: <0.3408.0>
    registered_name: []
    error_info: {exit,{on_load_function_failed,'Elixir.ExMagick'},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,352}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
    ancestors: [kernel_sup,<0.3383.0>]
    messages: []
    links: [<0.3384.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 376
    stack_size: 27
    reductions: 117
2018-04-11 03:36:29 supervisor_report
    supervisor: {local,kernel_sup}
    errorContext: start_error
    reason: {on_load_function_failed,'Elixir.ExMagick'}
    offender: [{pid,undefined},{id,kernel_safe_sup},{mfargs,{supervisor,start_link,[{local,kernel_safe_sup},kernel,safe]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]
2018-04-11 03:36:30 crash_report
    initial_call: {application_master,init,['Argument__1','Argument__2','Argument__3','Argument__4']}
    pid: <0.3382.0>
    registered_name: []
    error_info: {exit,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,'Elixir.ExMagick'}}},{kernel,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,134}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
    ancestors: [<0.3381.0>]
    messages: [{'EXIT',<0.3383.0>,normal}]
    links: [<0.3381.0>,<0.3380.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 376
    stack_size: 27
    reductions: 152
2018-04-11 03:36:30 std_info
    application: kernel
    exited: {{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,'Elixir.ExMagick'}}},{kernel,start,[normal,[]]}}
    type: permanent
{"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,'Elixir.ExMagick'}}},{kernel,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,'Elixir.ExMagick'}}},{kernel,start,[normal,

weixiyen avatar Apr 11 '18 03:04 weixiyen

the master of exmagick includes flags to enable dirty scheduler that requires a fairly recent erlang version, probably that is why. we've been testing this in production for a while now and is stable but we haven't made a release yet to avoid breaking compatibility. let me fix that and we test again.

dgvncsz0f avatar Apr 11 '18 10:04 dgvncsz0f

ok. i made dirty scheduler disabled by default [opt-in by using an env var]. could you try it again?

dgvncsz0f avatar Apr 11 '18 11:04 dgvncsz0f

Yep, trying it again.

I just updated my erlang distribution to erts-9.3 with the env flag turned on.

Will get back to you in about ~12 hours with some graphs to compare before / after.

Thank you!

weixiyen avatar Apr 11 '18 17:04 weixiyen

Hi @dgvncsz0f - just checking back in.

I think there is still a memory leak, and it's kind of hard to tell how much the fix affected anything.

After running in prod for roughly 24 hours, there is still a big difference between what Erlang thinks it is using versus what the Linux Kernel thinks.

For example, on one node, Erlang thinks it is using 300MB of memory while the Linux Kernel thinks Erlang is using 5GB of memory.

According to Fred Herbert, this type of symptom could be caused by a NIF: http://erlang.org/pipermail/erlang-questions/2013-September/075401.html

Although, I can't be 100% sure.

Is it possible that there are minor leaks related to either init!, image_load!, or size! functions?

I'll try to do some more testing on my end as well to see if I can get more helpful information

weixiyen avatar Apr 12 '18 18:04 weixiyen

@weixiyen thanks. if you can provide more information it will be valuable. later today i'll continue investigating looking for other possible leaks. i'll get back to you as soon as i have something up.

Thanks!

dgvncsz0f avatar Apr 12 '18 20:04 dgvncsz0f

sorry, accidentally closed when merged the pr...

dgvncsz0f avatar Apr 13 '18 01:04 dgvncsz0f