ML-Agents-with-Google-Colab icon indicating copy to clipboard operation
ML-Agents-with-Google-Colab copied to clipboard

mlagents_envs.exception.UnityEnvironmentException: Environment shut down with return code -6 (SIGABRT).

Open Pimool opened this issue 2 years ago • 8 comments

Hi, I am trying to run Reinforcement Learning on a GPU runbox.

With your code, I could train the model on Colab, and Saturn Cloud which is similar to colab.

However, when I tried to run on my personal GPU runbox, it occured an error.

mlagents-learn -h showed the options, so I thought it is a problem with environment.

How can I handle this error?

~$ mlagents-learn config.yaml --run-id=test --env=ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball.x86_64

Version information: ml-agents: 0.31.0.dev0, ml-agents-envs: 0.31.0.dev0, Communicator API: 1.5.0, PyTorch: 1.11.0+cu102 [INFO] Learning was interrupted. Please wait while the graph is generated. Traceback (most recent call last): File "/home/desktop/venv/bin/mlagents-learn", line 33, in sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')()) File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/learn.py", line 264, in main run_cli(parse_command_line()) File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/learn.py", line 260, in run_cli run_training(run_seed, options, num_areas) File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/learn.py", line 136, in run_training tc.start_learning(env_manager) File "/home/desktop/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped return func(*args, **kwargs) File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 197, in start_learning raise ex File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 172, in start_learning self._reset_env(env_manager) File "/home/desktop/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped return func(*args, **kwargs) File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 105, in _reset_env env_manager.reset(config=new_config) File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/env_manager.py", line 68, in reset self.first_step_infos = self._reset_env(config) File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 446, in _reset_env ew.previous_step = EnvironmentStep(ew.recv().payload, ew.worker_id, {}, {}) File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 101, in recv raise env_exception mlagents_envs.exception.UnityEnvironmentException: Environment shut down with return code -6 (SIGABRT).

Pimool avatar Aug 07 '23 02:08 Pimool

Hi @Pimool, Check the ml-agents version (the environment given in this repo was built for release_1). Also, it seems your training exited with a critical error suggested by the SIGABRT error code.

dhyeythumar avatar Aug 07 '23 15:08 dhyeythumar

Hi, @dhyeythumar I used ml-agents release 20(the recent one). However, it worked in colab and Saturn cloud with your environment on release 20.. So, I don't think it's a problem with release.. I don't know why the SIGABRT error occurs only on my personal GPU server.

Pimool avatar Aug 08 '23 01:08 Pimool

Then most probably the environment is exiting with an error, it's possible that the Linux executable is not supported on GPU. If I remember correctly on colab this env works on the CPU instance itself haven't tried it on GPU (try this and see if the GPU instance on colab works or not).

dhyeythumar avatar Aug 08 '23 04:08 dhyeythumar

It works in colab on T4 GPU. Also, Saturn Cloud was on GPU, too. The below is the colab notebook. https://colab.research.google.com/drive/1sFY_V-uirL9pCPBlHkme8zBMfp3e1cJQ?usp=sharing

Pimool avatar Aug 08 '23 04:08 Pimool

Below is the Player-0.log file when I try start training with the code above. I have no idea about the errors, and why the handler cannot load such files. Any help is greatly appreciated

''' Mono path[0] = '/home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Managed' Mono config path = '/home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/MonoBleedingEdge/etc' Preloaded 'lib_burst_generated.so' Preloaded 'libgrpc_csharp_ext.x64.so' Initialize engine version: 2019.3.15f1 (59ff3e03856d) [Subsystems] Discovering subsystems at path /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/UnitySubsystems Forcing GfxDevice: Null GfxDevice: creating device client; threaded=0 NullGfxDevice: Version: NULL 1.0 [1.0] Renderer: Null Device Vendor: Unity Technologies Begin MonoManager ReloadAssembly Completed reload, in 0.142 seconds WARNING: Shader Unsupported: 'Autodesk Interactive' - All passes removed WARNING: Shader Did you use #pragma only_renderers and omit this platform? UnloadTime: 1.141076 ms Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib Caught fatal signal - signo:11 code:1 errno:0 addr:0x561114101530 Obtained 4 stack frames. 0 0x007f8bd7a1a520 in __sigaction 1 0x007f8bd66696b5 in grpc_completion_queue_create_internal(grpc_cq_completion_type, grpc_cq_polling_type) 2 0x007f8bd666abf0 in grpc_completion_queue_create_for_next 3 0x000000405aa870 in (wrapper managed-to-native) object:wrapper_native_0x7f8bd6657df0 () '''

Pimool avatar Aug 17 '23 07:08 Pimool

Hi @Pimool , try this command !mlagents-learn config.yaml --run-id=$run_id --env=$env_name --no-graphics

I guess on your server it's trying to render the environment.

dhyeythumar avatar Aug 18 '23 03:08 dhyeythumar

Hi, @dhyeythumar

Thanks for your advice, but It raises same error.

Pimool avatar Aug 18 '23 04:08 Pimool

Yeah, we have the same problem and we couldnt find any solution

OmarVector avatar Sep 15 '23 08:09 OmarVector