carla-roach icon indicating copy to clipboard operation
carla-roach copied to clipboard

Error with training RL expert

Open thoithoi58 opened this issue 2 years ago • 1 comments

Hi, I'm running run/train_rl.sh and keep receiving this error

[2022-05-15 08:09:58,133][utils.server_utils][INFO] - Kill Carla Servers!
CarlaUE4-Linux: no process found
[2022-05-15 08:09:59,167][utils.server_utils][INFO] - Kill Carla Servers!
[2022-05-15 08:09:59,168][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/thoaican/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2000
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
Traceback (most recent call last):
  File "train_rl.py", line 40, in main
    agent = AgentClass('config_agent.yaml')
  File "/home/thoaican/carla-roach/agents/rl_birdview/rl_birdview_agent.py", line 15, in __init__
    self.setup(path_to_conf_file)
  File "/home/thoaican/carla-roach/agents/rl_birdview/rl_birdview_agent.py", line 27, in setup
    f = max(all_ckpts, key=lambda x: int(x.name.split('_')[1].split('.')[0]))
ValueError: max() arg is an empty sequence

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[2022-05-15 08:10:05,478][wandb.sdk.internal.internal][INFO] - Internal process exited

I've browsed the issues page and found the same error from other person here, and the solution is delete the outputs/checkpoint.txt. But to me it was no help

thoithoi58 avatar May 15 '22 01:05 thoithoi58

@thoithoi58, after you delete outputs/checkpoint.txt, run train_rl.sh. Make sure you let enough training take place that something syncs with wandb. Copy the run_id from outputs/checkpoint.txt to rl_birdview_agent.py.

Here's my code: https://github.com/neilsambhu/carla-roach/blob/NeilBranch0/timeline/README3.md Go to timestamps (1) 6/19/2022 5:44 PM and (2) 6/20/2022 11:50 AM

Here is my rl_birdview_agent.py: https://github.com/neilsambhu/carla-roach/blob/NeilBranch0/agents/rl_birdview/rl_birdview_agent.py#L107 As of writing this message, I'm still waiting for my initial training to finish to see if my code to read in the run_id from the outputs/checkpoint.txt works.

neilsambhu avatar Jun 20 '22 19:06 neilsambhu