The problem of fsns train_fsns.py
there is a problem of fsns train_fsns.py: first:The NCCL already is installed in my new environment by following steps 1. https://developer.nvidia.com/nccl 2.sudo dpkg -i nccl-repo-ubuntu1604-2.2.12-ga-cuda8.0_1-1_amd64.deb 3.sudo apt update 4.sudo apt-get install libnccl2=2.2.12-1+cuda8.0 libnccl-dev=2.2.12-1+cuda8.0 5.sudo cp /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/lib64 6. sudo cp /usr/include/nccl.h /usr/local/cuda/include/ 7.chmod a+r /usr/local/cuda/include/nccl.h /usr/local/cuda/lib64/libnccl.so.2 second :when i try to excute python train_fsns.py,the below problem is occuring.
(SEE) mayongjuan@visionGroup:/home/code/mayongjuan/see/chainer$ python train_fsns.py /home/data/fsns/image/curriculum.json /home/code/mayongjuan/see/fsns-model --blank-label 0 --char-map ../datasets/fsns/fsns_char_map.json -b 50
Traceback (most recent call last):
File "train_fsns.py", line 169, in
I can't figure out why the problem is exiting,i 'm looking forward to your answer. very thanks
did you do this Please reinstall chainer after you install NCCL.?
@Bartzi Yes, I uninstalled the previously installed 3.2.0 version of the chainer, reinstalling chainer==6.0.0b3 from this URL “https://github.com/chainer/chainer”, but when I execute this command" python train_fsns.py /home/data/fsns /image/curriculum.json /home/code/mayongjuan/see-master/fsns-model --char-map ../datasets/fsns/fsns_char_map.json --blank-label 0 -b 20 -g 0 3",I have encountered the following problem again. -------------------------------------------------------------------------------- CuPy (cupy) version 2.2.0 may not be compatible with this version of Chainer. Please consider installing the supported version by running: $ pip install 'cupy==6.0.0b3' See the following page for more details: https://docs-cupy.chainer.org/en/latest/install.html
so I executed the two commands "pip uninstall cupy==2.2.0" and reinstalled "cupy-cuda80==6.0.0b3"
nextly I executed this command" python train_fsns.py /home/data/fsns /image/curriculum.json /home/code/mayongjuan/see-master/fsns-model --char-map ../datasets/fsns/fsns_char_map.json --blank-label 0 -b 20 -g 0 3” , but the following problem is raising.
/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:151: UserWarning: optimizer.eps is changed to 2e-08 by MultiprocessParallelUpdater for new batch size. format(optimizer.eps)) Segmentation fault (core dumped)
could you tell me why ?
Are you sure you are using CUDA 8.0 on your machine?
yeah i am sure.
Well, then I don't know... I did not ever use cupy in Version 6, yet... So this might be an issue. Did you try to use the docker container?
No, I haven't used the docker container before, maybe I can try it. very thanks@Bartzi