competition_submission_template icon indicating copy to clipboard operation
competition_submission_template copied to clipboard

train_locally.py error

Open kaixin96 opened this issue 4 years ago • 6 comments

When running train_locally.py, the script print "{'state': 'ERROR', 'score': {}, 'instances': [], 'reason': 'You started more instances (2) then allowed limit (1).'}". Is this an intended behavior?

kaixin96 avatar Sep 07 '19 09:09 kaixin96

I got the same problem. And also this message appeared: touch: cannot touch 'shared/training_exited': No such file or directory touch: cannot touch 'shared/training_exited'

danperazzo avatar Jul 20 '21 16:07 danperazzo

Hi @danperazzo,

Can you share the following details?

  1. Operating System
  2. Command which you ran?
  3. More output logs if possible?

The reason why it is happening: The starter kit assumes you will be running maximum of 2 instances in parallel (default).


Quick Fix:

  1. You can control it via MAX_ALLOWED_INSTANCES environment variable i.e. export MAX_ALLOWED_INSTANCES=10, etc.
  2. In case you want to disable the errors completely, you can also edit raise_on_error to False.

skbly7 avatar Jul 20 '21 16:07 skbly7

  1. Operating System: Ubuntu 20.04
  2. Command which I ran: bash utility/train_locally.sh --verbose
  3. Output log:

/home/daniel/anaconda3/envs/minerl/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32 warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow')) Verifying (and downloading) MineRL dataset.. If you do not want to use the data: run the local evaluation scripts with --no-data If you want to use your existing download of the data: make sure your MINERL_DATA_ROOT is set. Data directory is data Data verified! A+! Not starting broadcast server for localhost. NS running on localhost:9090 (127.0.0.1) Warning: HMAC key not set. Anyone can connect to this server! URI = PYRO:Pyro.NameServer@localhost:9090 /home/daniel/anaconda3/envs/minerl/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32 warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow')) Removing the performance directory! autoproxy? True Object <class 'minerl.env.malmo.InstanceManager'>: uri = PYRO:obj_c55d44e7f7304d42972e6b59bee1c590@localhost:34461 name = minerl.instance_manager Pyro daemon running. RUNNING TRAINING! 2021-07-20 14:40:35 daniel-800G5H-800G5S minerl.env.malmo[46459] DEBUG Recieved keep-alive callback from client 46490. Starting thread. 2021-07-20 14:40:35 daniel-800G5H-800G5S minerl.env.malmo[46459] DEBUG Client keep-alive connection monitor started for 46490. /home/daniel/anaconda3/envs/minerl/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32 warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow')) 2021-07-20 14:40:36 daniel-800G5H-800G5S root[46490] INFO Training Start... 2021-07-20 14:40:36 daniel-800G5H-800G5S crowdai_api.events[46490] DEBUG Registering crowdAI API Event : CROWDAI_EVENT_INFO training_started {'event_type': 'minerl_challenge:training_started'} # with_oracle? : False 2021-07-20 14:40:36 daniel-800G5H-800G5S minerl.data.data_pipeline[46490] DEBUG Loading from file data/MineRLObtainIronPickaxeVectorObf-v0/v3_neighboring_arugula_genii-4_155262-158776 100%|███████████████████████████████████| 3452/3452 [00:00<00:00, 164819.54it/s] 2021-07-20 14:40:37 daniel-800G5H-800G5S minerl.data.data_pipeline[46490] DEBUG Loading from file data/MineRLObtainIronPickaxeVectorObf-v0/v3_flustered_tuber_doppelganger-1_7345-13432 100%|███████████████████████████████████| 6046/6046 [00:00<00:00, 162758.57it/s] 2021-07-20 14:40:44 daniel-800G5H-800G5S root[46490] INFO Training End... 2021-07-20 14:40:44 daniel-800G5H-800G5S root[46490] INFO Progress : 1.0 2021-07-20 14:40:44 daniel-800G5H-800G5S crowdai_api.events[46490] DEBUG Registering crowdAI API Event : CROWDAI_EVENT_INFO register_progress {'event_type': 'minerl_challenge:register_progress', 'training_progress': 1.0} # with_oracle? : False 2021-07-20 14:40:44 daniel-800G5H-800G5S crowdai_api.events[46490] DEBUG Registering crowdAI API Event : CROWDAI_EVENT_INFO training_ended {'event_type': 'minerl_challenge:training_ended'} # with_oracle? : False touch: cannot touch 'shared/training_exited': No such file or directory touch: cannot touch 'shared/training_exited': No such file or directory

Obs: I managed to download the dataset.

danperazzo avatar Jul 20 '21 17:07 danperazzo

Hey, looks like things work as expected. The errors/warnings in the end are things that do not work outside AICrowd. As long there are no Python errors or "STATUS: ERROR" printouts :)

(PS: You do not need to test train_locally.sh for Round1 of the competition, evaluation_locally.sh is enough)

Miffyli avatar Jul 20 '21 17:07 Miffyli

Yeah, looks like everything went well in your run as @Miffyli mentioned. touch shared/training_exited is used only during evaluation and shouldn't do any harm locally.

But still a bit weird that it didn't work, we have shared/.gitignore file for it and similar things to work. I would recommend you to check if that folder exists, in case it doesn't or is deleted by mistake, try doing mkdir shared/ in your repository.

skbly7 avatar Jul 20 '21 18:07 skbly7

As @Miffyli said, just doing sudo apt install xvfb worked for me

danperazzo avatar Jul 20 '21 18:07 danperazzo