random-network-distillation icon indicating copy to clipboard operation
random-network-distillation copied to clipboard

KeyError: 'RCALL_NUM_GPU' ?

Open lhlong opened this issue 6 years ago • 6 comments

image Need to open mpi_util.py and change line 59 to:

if 'RCALL_NUM_GPU' in os.environ:
        n_gpus = int(os.environ['RCALL_NUM_GPU'])

lhlong avatar Nov 01 '18 15:11 lhlong

Having the same problem

Sungtae-Lee avatar Nov 03 '18 04:11 Sungtae-Lee

I have the same problem and also ran into it in the large-scale-curiousity example. It appears to be a MPI Problem. I guess it is because the GPU driver path is listed only for linux, thus it won't work on windows.

Edit: I correct myself, for RND setting GPU to 1 works!

In mpi_util.py change

line 60 to available_gpus = 1

and

line 70 to os.environ['CUDA_VISIBLE_DEVICES'] = str(1)

Seems to work for me, it started to train!

michael20at avatar Nov 20 '18 23:11 michael20at

It's not necessary to change codes. Just set the enviroment variable CUDA_VISIBLE_DEVICES on the shell.

cuspymd avatar Dec 11 '18 09:12 cuspymd

@cuspymd Does this require that you have an nVidia GPU?

Ploppz avatar Dec 22 '18 10:12 Ploppz

@Ploppz

it's an environment variable you can define/set; you can set it even if you don't have an nvidia gpu.

so you can change line 59 as above and then run export CUDA_VISIBLE_DEVICES=0 from the command line, and you should be good to go.

lucaslingle avatar Feb 11 '19 09:02 lucaslingle

@lucaslingle @cuspymd I have a similar problem,

Traceback (most recent call last): File "ParaRetrieval.py", line 18, in arrayid = int(os.environ['SLURM_ARRAY_TASK_ID']) #\u5bf9\u5e94sh\u6587\u4ef6\u91cc\u7684-t,\u7528\u6765\u63a7\u5236\u5e76 \u884c\u8fd0\u7b97\u94fe\uff0cserver\u7684\u7ba1\u7406\u7cfb\u7edf\u4e3aSLURM File "/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/os.py", line 669, in getitem raise KeyError(key) from None KeyError: 'SLURM_ARRAY_TASK_ID'

Here I used 'SLURM_ARRAY_TASK_ID' to do the array task, I see the server system is SLURM with NHC. Could you tell me how can I fix this problem? Thank you very much!

youwasha avatar Mar 23 '19 00:03 youwasha