driving-in-the-matrix
driving-in-the-matrix copied to clipboard
guidance for training without docker?
Hi All,
I would like to get your training pipeline for benchmarking up and running. However, I'm facing the following issue while building the MXNet RCNN container:
`You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example:
git checkout -b
HEAD is now at 5568641... fixes loading of voc evaluations for empty classes (#4840) Makefile:27: mshadow/make/mshadow.mk: No such file or directory Makefile:28: /root/mxnet/dmlc-core/make/dmlc.mk: No such file or directory Makefile:126: /root/mxnet/ps-lite/make/ps.mk: No such file or directory make: *** No rule to make target '/root/mxnet/ps-lite/make/ps.mk'. Stop. The command '/bin/sh -c cd /root && git clone --recursive https://github.com/dmlc/mxnet && cd mxnet && git checkout 5568641d99c7d7dac2aaab53d35f0a70c15b3a7f && cp make/config.mk config.mk && sed -i 's/USE_BLAS = atlas/USE_BLAS = openblas/g' config.mk && sed -i 's/USE_CUDA = 0/USE_CUDA = 1/g' config.mk && sed -i 's/USE_CUDA_PATH = NONE/USE_CUDA_PATH = /usr/local/cuda/g' config.mk && sed -i 's/USE_CUDNN = 0/USE_CUDNN = 1/g' config.mk && sed -i 's/EXTRA_OPERATORS =/EXTRA_OPERATORS = example/rcnn/operator/g' config.mk && sed -i 's/-gencode arch=compute_50,code=compute_50/-gencode arch=compute_50,code=compute_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62/g' config.mk && make -j"$(nproc)"' returned a non-zero code: 2 ` Do you have any ideas / suggestions how to fix this? I'm getting the same error on two different Linux machines....
Despite that I would like to run your pipeline outside of docker. Can I find a guidance for this somewhere? Looking at the docker guidance I cannot figure out how to run the training... It is unclear to me where train_end2end.py script is coming from. I don't see this in the Faster R-CNN example in MXNet under incubator-mxnet/example/rcnn/. The existing train.py script does not run somehow on your data...
Thanks in advance for your help! Any suggestions are highly appreciated!
Best, Alexey
For train_end2end.py, just check an older version. It's used to be there: https://github.com/apache/incubator-mxnet/tree/0768c0e97c5aa1d142ff0b3b8d37b1c736a42b83/example/rcnn