ImageNet
ImageNet copied to clipboard
Trial on kaggle imagenet object localization by yolo v3 in google cloud
Trial on kaggle imagenet object localization by yolov3 in google cloud
project:https://www.kaggle.com/c/imagenet-object-localization-challengeyolo:https://pjreddie.com/darknet/yolo/
nice yolo explanation:https://medium.com/@jonathan_hui/real-time-object-detection-with-yolo-yolov2-28b1b93e2088
My operation system is Mac OS, although it doesn't matter much, because we do most steps on google cloud.
Let's do it step by step:
1.Basic environment preparation:
I.Apply a google cloud account.
ps:Google provide $300 credit trial time for first time sign up account.II.Create a ubuntu 16.04 compute engine instance on google cloud, with 400G SSD disk, 4 cores' cpu and, 15G memories.
we will change it a little bit later coz GPU/CPU extention or other reasons, but now it's enough.III.Apply quotas increasing on Nvidia tesla K80/P100/V100, coz we don't have permission to use gpu default.
GPUs cost credit so fast, so we can choose it by needed. For me, I just increase 1 K80s for test, 4 P100s for training our model, haven't tried on V100 yet.IV.SSH connection by RSA keys.
#:ssh-keygen -t rsa -f ~/.ssh/gc_rsa -C anynamehereNo pass word is easy for login.
#:cd ~/.ssh
#:vi gc_rsa.pub
then go to google cloud, copy everything in gc_rsa.pub to ubuntu instance SSH key part.
#:chmod 400 gc_rsa
#:ssh -i gc_rsa anynamehere@your google cloud external ip
we can also connect by 'FileZilla', no more words here.
V.pip installation
#:sudo apt update#:sudo apt upgrade
#:sudo apt-get -y install python-pip
#:sudo apt-get -y install python3-pip
VI.kaggle-cli installation
#:pip install kaggle-cli2.Dataset download
#:kg download -u <your kaggle username> -p <your kaggle password> -c imagenet-object-localization-challenge// dataset is about 160G, so it will cost about 1 hour if your instance download speed is around 42.9 MiB/s.
// let's open another ssh connection to do next step when it's doing the download process.
3.Opencv-3.4.0 installation(we will turn on opencv option in yolo project later for better image processing)
execute all the steps in the following url.http://www.python36.com/how-to-install-opencv340-on-ubuntu1604/
4.Cuda 9.0 with cudnn 7.0 installation
we can use the fallowing bash script, download it and execute it in instance.https://gist.github.com/ashokpant/5c4e9481615f54af4025ab2085f85869#file-cuda_9-0_cudnn_7-0-sh
5.Cudnn library configuration
go to https://developer.nvidia.com/rdp/cudnn-download to download cuDNN v7.0.5 Library for Linux CUDA 9.0it's name should be cudnn-9.0-linux-x64-v7.tgz, we use scp command or filezilla to move this package from local machine to remote instance.
#:scp -i ~/.ssh/gc_rsa Downloads/cudnn-9.0-linux-x64-v7.tgz anynamehere@your google cloud external ip:~/
// come to instance window
#:tar zxvf cudnn-9.0-linux-x64-v7.tgz
#:cd cuda
#:sudo cp include/* /usr/local/cuda-9.0/include/
#:sudo cp lib64/* /usr/local/cuda-9.0/lib64/
#:echo 'export PATH=/usr/local/cuda-9.0/bin:$PATH' >> ~/.bashrc
#:echo 'export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64/:$LD_LIBRARY_PATH' >> ~/.bashrc
#:source ~/.bashrc
6.Shutdown instance, add 1 piece of K80 GPU, then boot instance again.
ps:For frugality, we can revise number of cpu cores from 4 to 2// view GPUs detailed info
#: nvidia-smi
// view the number of CPUs
#:nproc
7.X11 installatoin both of instance and our local machine, so that we can see our predicted image remotely.
#:sudo apt-get install xorg openbox// what I need on my mac is XQuartz.
// install feh, so that we can see any picture remotely.
#:sudo apt install feh
// logout from instance, connect it with additional parameter, then test it
#:ssh -Y -i ~/.ssh/gc_rsa anynamehere@your google cloud external ip
#:feh darknet/data/dog.jpg
8.Yolo installation
#:git clone https://github.com/pjreddie/darknet#:cd darknet
#:make
9.Test yolov3
// Actually we've done a good job until now, but we still can't see expected result if we won't change Makefile a little bit,// I haven't figured out the reason, although let's just change it now.
#:cd darknet
#:sed -i 's/GPU=./GPU=1/' Makefile
#:sed -i 's/CUDNN=./CUDNN=0/' Makefile
#:sed -i 's/OPENCV=./OPENCV=1/' Makefile
#:sed -i 's/OPENMP=./OPENMP=1/' Makefile
#:sed -i 's/DEBUG=./DEBUG=0/' Makefile
#:make
#:wget https://pjreddie.com/media/files/yolov3.weights
#:./darknet detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg
10.Now let's train it.
I.training data preprocessing.
#:cd ~#:tar zxvf imagenet_object_localization.tar.gz
// delete package so that we'll have enough disk space.
#:rm imagenet_object_localization.tar.gz
// view disk space info.
#: df -h
// Data preparation
#:unzip LOC_synset_mapping.txt.zip
#:mkdir ILSVRC/Data/CLS-LOC/train/images
#:mv ILSVRC/Data/CLS-LOC/train/n* ILSVRC/Data/CLS-LOC/train/images/
#:mv ILSVRC/Data/CLS-LOC/val/ ILSVRC/Data/CLS-LOC/images
#:mkdir ILSVRC/Data/CLS-LOC/val/
#:mv ILSVRC/Data/CLS-LOC/images ILSVRC/Data/CLS-LOC/val/images
#:git clone https://github.com/mingweihe/ImageNet
#:pip3 install pandas
#:pip3 install pathlib
#:cd ImageNet
// generating all training formatted label files costs about 20 minutes
#:python3 generate_labels.py ../LOC_synset_mapping.txt ../ILSVRC/Annotations/CLS-LOC/train ../ILSVRC/Data/CLS-LOC/train/labels 1
// generating all validation formatted label files
#:python3 generate_labels.py ../LOC_synset_mapping.txt ../ILSVRC/Annotations/CLS-LOC/val ../ILSVRC/Data/CLS-LOC/val/labels 0
#:cd ~
#:find `pwd`/ILSVRC/Data/CLS-LOC/train/labels/ -name \*.txt > darknet/data/inet.train.list
#:sed -i 's/\.txt/\.JPEG/g' darknet/data/inet.train.list
#:sed -i 's/labels/images/g' darknet/data/inet.train.list
#:find `pwd`/ILSVRC/Data/CLS-LOC/val/labels/ -name \*.txt > darknet/data/inet.val.list
#:sed -i 's/\.txt/\.JPEG/g' darknet/data/inet.val.list
#:sed -i 's/labels/images/g' darknet/data/inet.val.list
II.pretrained weights preparation.
#:cd darknet#:wget https://pjreddie.com/media/files/darknet53.conv.74
III.Traning
#:./darknet detector train ~/ImageNet/ILSVRC.data ~/ImageNet/yolov3-ILSVRC.cfg darknet53.conv.74// we can also restart training from a checkpoint:
#:./darknet detector train ~/ImageNet/ILSVRC.data ~/ImageNet/yolov3-ILSVRC.cfg backup/yolov3-ILSVRC.backup
IV.Training with multiple GPUs
// shutdown instance, increase number of GPUs from 1 piece's K80 to 4 pieces' P100, with 6 CPUs.// boot instance, start training using following command
#:./darknet detector train ~/ImageNet/ILSVRC.data ~/ImageNet/yolov3-ILSVRC.cfg backup/yolov3-ILSVRC.backup -gpus 0,1,2,3
// continue from checkpoints we can replace darknet53.conv.74 with backup file.
V.Training without ssh connection
#:screen#:./darknet detector train ~/ImageNet/ILSVRC.data ~/ImageNet/yolov3-ILSVRC.cfg backup/yolov3-ILSVRC.backup -gpus 0,1,2,3
// press Keys of <Ctrl+a>
// then press <d> key
// now we have our task done detached from our local machine
// If we wanna put task back, we can connect ssh, then:
#:screen -r
// For more detailed instruction, just google "linux screen detach".
11.Prediction and transfer predictions into CSV file.
#:unzip LOC_sample_submission.csv.zip#:mkdir ~/submissions
#:python3 ~/ImageNet/predict.py
12.Submit our predictions.
#kg submit <submission-file> -u <your kaggle username> -p <your kaggle password> -c imagenet-object-localization-challenge -m "my submission"(optional way is submitting it on kaggle website by using any web browser.)