Dojo.jl icon indicating copy to clipboard operation
Dojo.jl copied to clipboard

RL examples segfaults when using more than one thread

Open GlenHenshaw opened this issue 2 years ago • 1 comments

Julia 1.7.2, macOS 12.3, Apple M1 architecture. Appears to happen only when Julia is called with more than one thread, eg "> Julia --threads 3"

    julia> include("halfcheetah_ars.jl")
      Activating project at `~/.julia/packages/Dojo/tpwPK/examples`
    ┌ Info: MeshCat server started. You can open the visualizer by visiting the following URL in your browser:
    └ http://127.0.0.1:8714
    ┌ Warning: Assignment to `env` in soft scope is ambiguous because a global variable by the same name exists: `env` will be treated as a new local. Disambiguate by using `local env` to suppress this warning or `global env` to assign to the existing global variable.
    └ @ ~/.julia/packages/Dojo/tpwPK/examples/reinforcement_learning/halfcheetah_ars.jl:27
    ┌ Warning: Assignment to `obs` in soft scope is ambiguous because a global variable by the same name exists: `obs` will be treated as a new local. Disambiguate by using `local obs` to suppress this warning or `global obs` to assign to the existing global variable.
    └ @ ~/.julia/packages/Dojo/tpwPK/examples/reinforcement_learning/halfcheetah_ars.jl:29
    ┌ Info: MeshCat server started. You can open the visualizer by visiting the following URL in your browser:
    └ http://127.0.0.1:8715
    Training linear policy with Augmented Random Search (ARS)
    
     4
    signal (11): Segmentation fault: 11
    in expression starting at /Users/glenhenshaw/.julia/packages/Dojo/tpwPK/examples/reinforcement_learning/halfcheetah_ars.jl:25
    
    signal (11): Segmentation fault: 11
    in expression starting at /Users/glenhenshaw/.julia/packages/Dojo/tpwPK/examples/reinforcement_learning/halfcheetah_ars.jl:25
    
    signal (6): Abort trap: 6
    in expression starting at /Users/glenhenshaw/.julia/packages/Dojo/tpwPK/examples/reinforcement_learning/halfcheetah_ars.jl:25
    zsh: abort      julia --threads 4

Here's another, different stack trace:

    julia> include("halfcheetah_ars.jl")
      Activating project at `~/.julia/packages/Dojo/tpwPK/examples`
    ┌ Info: MeshCat server started. You can open the visualizer by visiting the following URL in your browser:
    └ http://127.0.0.1:8713
    ┌ Warning: Assignment to `env` in soft scope is ambiguous because a global variable by the same name exists: `env` will be treated as a new local. Disambiguate by using `local env` to suppress this warning or `global env` to assign to the existing global variable.
    └ @ ~/.julia/packages/Dojo/tpwPK/examples/reinforcement_learning/halfcheetah_ars.jl:27
    ┌ Warning: Assignment to `obs` in soft scope is ambiguous because a global variable by the same name exists: `obs` will be treated as a new local. Disambiguate by using `local obs` to suppress this warning or `global obs` to assign to the existing global variable.
    └ @ ~/.julia/packages/Dojo/tpwPK/examples/reinforcement_learning/halfcheetah_ars.jl:29
    ┌ Info: MeshCat server started. You can open the visualizer by visiting the following URL in your browser:
    └ http://127.0.0.1:8714
    Training linear policy with Augmented Random Search (ARS)
    
     3
    signal (11): Segmentation fault: 11
    in expression starting at /Users/glenhenshaw/.julia/packages/Dojo/tpwPK/examples/reinforcement_learning/halfcheetah_ars.jl:25
    loadtriplet! at /Users/sabae/src/julia/usr/share/julia/stdlib/v1.7/Base64/src/encode.jl:0
    unsafe_write at /Users/sabae/src/julia/usr/share/julia/stdlib/v1.7/Base64/src/encode.jl:53
    unsafe_write at ./io.jl:648 [inlined]
    write at ./io.jl:671
    unknown function (ip: 0x10c0a5837)
    jl_apply_generic at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
    #base64encode#5 at /Users/sabae/src/julia/usr/share/julia/stdlib/v1.7/Base64/src/encode.jl:209
    base64encode##kw at /Users/sabae/src/julia/usr/share/julia/stdlib/v1.7/Base64/src/encode.jl:206 [inlined]
    #base64encode#6 at /Users/sabae/src/julia/usr/share/julia/stdlib/v1.7/Base64/src/encode.jl:216 [inlined]
    base64encode at /Users/sabae/src/julia/usr/share/julia/stdlib/v1.7/Base64/src/encode.jl:216 [inlined]
    generate_websocket_key at /Users/glenhenshaw/.julia/packages/WebSockets/QcswW/src/WebSockets.jl:548 [inlined]
    upgrade at /Users/glenhenshaw/.julia/packages/WebSockets/QcswW/src/HTTP.jl:189
    unknown function (ip: 0x10c0a338b)
    jl_apply_generic at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
    _servercoroutine at /Users/glenhenshaw/.julia/packages/WebSockets/QcswW/src/HTTP.jl:370
    macro expansion at /Users/glenhenshaw/.julia/packages/HTTP/aTjcj/src/Servers.jl:415 [inlined]
    #13 at ./task.jl:423
    unknown function (ip: 0x10bf86bd7)
    jl_apply_generic at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
    start_task at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
    Allocations: 524622903 (Pool: 523434761; Big: 1188142); GC: 176
    
    signal (11): Segmentation fault: 11
    in expression starting at /Users/glenhenshaw/.julia/packages/Dojo/tpwPK/examples/reinforcement_learning/halfcheetah_ars.jl:25
    loadtriplet! at /Users/sabae/src/julia/usr/share/julia/stdlib/v1.7/Base64/src/encode.jl:0
    close at /Users/sabae/src/julia/usr/share/julia/stdlib/v1.7/Base64/src/encode.jl:111
    #3 at /Users/sabae/src/julia/usr/share/julia/stdlib/v1.7/Base64/src/encode.jl:42
    unknown function (ip: 0x10c0aad5b)
    jl_apply_generic at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
    run_finalizer at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
    jl_gc_run_finalizers_in_list at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
    jl_gc_run_all_finalizers at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
    jl_atexit_hook at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
    jl_exit at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
    jl_exit_thread0_cb at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
    Allocations: 524622903 (Pool: 523434761; Big: 1188142); GC: 176

GlenHenshaw avatar Mar 23 '22 14:03 GlenHenshaw

Julia 1.7 is not that stable on Apple M1 (threading definitely) and would recommend running in Rosetta or trying Julia v1.8/nightly.

rejuvyesh avatar Mar 23 '22 16:03 rejuvyesh

The halfcheetah example has been removed, but feel free to reopen in case the stability issues still exist.

janbruedigam avatar Apr 12 '23 07:04 janbruedigam