see The problem of fsns train

there is a problem of fsns train_fsns.py: first：The NCCL already is installed in my new environment by following steps 1. https://developer.nvidia.com/nccl 2.sudo dpkg -i nccl-repo-ubuntu1604-2.2.12-ga-cuda8.0_1-1_amd64.deb 3.sudo apt update 4.sudo apt-get install libnccl2=2.2.12-1+cuda8.0 libnccl-dev=2.2.12-1+cuda8.0 5.sudo cp /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/lib64 6. sudo cp /usr/include/nccl.h /usr/local/cuda/include/ 7.chmod a+r /usr/local/cuda/include/nccl.h /usr/local/cuda/lib64/libnccl.so.2 second ：when i try to excute python train_fsns.py,the below problem is occuring.

(SEE) mayongjuan@visionGroup:/home/code/mayongjuan/see/chainer$ python train_fsns.py /home/data/fsns/image/curriculum.json /home/code/mayongjuan/see/fsns-model --blank-label 0 --char-map ../datasets/fsns/fsns_char_map.json -b 50 Traceback (most recent call last): File "train_fsns.py", line 169, in updater = MultiprocessParallelUpdater(train_iterators, optimizer, devices=args.gpus) File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 116, in init 'NCCL is not enabled. MultiprocessParallelUpdater ' Exception: NCCL is not enabled. MultiprocessParallelUpdater requires NCCL. Please reinstall chainer after you install NCCL. (see https://github.com/chainer/chainer#installation). Exception ignored in: <bound method MultiprocessIterator.del of <chainer.iterators.multiprocess_iterator.MultiprocessIterator object at 0x7f431e1987f0>> Traceback (most recent call last): File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 117, in del File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 244, in terminate File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/threading.py", line 347, in notify TypeError: 'NoneType' object is not callable Exception ignored in: <bound method MultiprocessIterator.del of <chainer.iterators.multiprocess_iterator.MultiprocessIterator object at 0x7f431e1987b8>> Traceback (most recent call last): File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 117, in del File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 244, in terminate File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/threading.py", line 347, in notify TypeError: 'NoneType' object is not callable

I can't figure out why the problem is exiting,i 'm looking forward to your answer. very thanks

Mar 31 '19 03:03 MS-MA

did you do this Please reinstall chainer after you install NCCL.?

Apr 03 '19 11:04 Bartzi

@Bartzi Yes, I uninstalled the previously installed 3.2.0 version of the chainer, reinstalling chainer==6.0.0b3 from this URL “https://github.com/chainer/chainer”, but when I execute this command" python train_fsns.py /home/data/fsns /image/curriculum.json /home/code/mayongjuan/see-master/fsns-model --char-map ../datasets/fsns/fsns_char_map.json --blank-label 0 -b 20 -g 0 3",I have encountered the following problem again. -------------------------------------------------------------------------------- CuPy (cupy) version 2.2.0 may not be compatible with this version of Chainer. Please consider installing the supported version by running: $ pip install 'cupy==6.0.0b3' See the following page for more details: https://docs-cupy.chainer.org/en/latest/install.html

so I executed the two commands "pip uninstall cupy==2.2.0" and reinstalled "cupy-cuda80==6.0.0b3"

nextly I executed this command" python train_fsns.py /home/data/fsns /image/curriculum.json /home/code/mayongjuan/see-master/fsns-model --char-map ../datasets/fsns/fsns_char_map.json --blank-label 0 -b 20 -g 0 3” , but the following problem is raising.

/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:151: UserWarning: optimizer.eps is changed to 2e-08 by MultiprocessParallelUpdater for new batch size. format(optimizer.eps)) Segmentation fault (core dumped)

could you tell me why ?

Apr 04 '19 01:04 MS-MA

Are you sure you are using CUDA 8.0 on your machine?

Apr 04 '19 09:04 Bartzi

yeah i am sure.

Apr 04 '19 15:04 MS-MA

Well, then I don't know... I did not ever use cupy in Version 6, yet... So this might be an issue. Did you try to use the docker container?

Apr 04 '19 15:04 Bartzi

No, I haven't used the docker container before, maybe I can try it. very thanks@Bartzi

Apr 08 '19 03:04 MS-MA

The problem of fsns train_fsns.py