VPGNet icon indicating copy to clipboard operation
VPGNet copied to clipboard

Code issues

Open SeokjuLee opened this issue 7 years ago • 55 comments

Please ask installation, training and test issues in this panel.

SeokjuLee avatar Dec 20 '17 05:12 SeokjuLee

Hello. There always something wrong when i download "Lee_VPGNet_Vanishing_Point_ICCV_2017_supplemental.pdf" from the internet, could you send a new pdf to me ? Thank you very much!

CSUMIT avatar Dec 20 '17 07:12 CSUMIT

@CSUMIT Oh, our supplemental file is not pdf format. It's just a video clip and here is the link :) https://www.youtube.com/watch?v=jnewRlt6UbI

SeokjuLee avatar Dec 20 '17 07:12 SeokjuLee

default Oh, it is a supplemental that i find in : http://openaccess.thecvf.com/ICCV2017.py. The supplemental of your paper.

CSUMIT avatar Dec 20 '17 07:12 CSUMIT

@CSUMIT I uploaded a video file but the organizer seems to have changed the file to pdf format. You may ignore that link.

SeokjuLee avatar Dec 20 '17 08:12 SeokjuLee

OK, thank you very much.

CSUMIT avatar Dec 20 '17 08:12 CSUMIT

@SeokjuLee ,when I run train.sh ,it got wrong at 2500 iteration

I1220 14:47:21.014974 402 solver.cpp:449] Snapshotting to binary proto file ./snapshots/split_iter_2500.caffemodel F1220 14:47:23.816285 402 io.cpp:67] Check failed: proto.SerializeToOstream(&output) *** Check failure stack trace: *** @ 0x7f4bd6f6b5cd google::LogMessage::Fail() @ 0x7f4bd6f6d433 google::LogMessage::SendToLog() @ 0x7f4bd6f6b15b google::LogMessage::Flush() @ 0x7f4bd6f6de1e google::LogMessageFatal::~LogMessageFatal() @ 0x7f4bd759d7a5 caffe::WriteProtoToBinaryFile() @ 0x7f4bd7571f46 caffe::Solver<>::SnapshotToBinaryProto() @ 0x7f4bd7572049 caffe::Solver<>::Snapshot() @ 0x7f4bd7572f90 caffe::Solver<>::Step() @ 0x7f4bd757392a caffe::Solver<>::Solve() @ 0x40c0cb train() @ 0x408830 main @ 0x7f4bd5c23830 __libc_start_main @ 0x408f89 _start @ (nil) (unknown)

Should I make the dir ./snapshots myself? Or use sudo train.sh?

daixiaogang avatar Dec 20 '17 10:12 daixiaogang

@daixiaogang Yes, please try it again after making ./snapshots. Updated the code. Thanks!

SeokjuLee avatar Dec 20 '17 11:12 SeokjuLee

@SeokjuLee ,I have trained your network with caltech-lane dataset correctly.After about 14 hours ,it iterates 30000,but when I plot the test and train loss log,it looks strange. The training loos converge by iteration but the test loss first increase and then unchanged.I did not change any of your configuration,should I make some change? As the dataset is small. (For training, cordova1 ,cordova2,washington1 as the trainlist ,washington2 as the testlist)

daixiaogang avatar Dec 21 '17 02:12 daixiaogang

@daixiaogang It is normal for the validation accuracy to rise sharply in the beginning and not change significantly afterwards. This is because the size of the object to be inferred is much smaller than the background area. If you expand the validation curve, you can see that it is increasing slightly.

SeokjuLee avatar Dec 21 '17 06:12 SeokjuLee

Is there any trained model that can be used for test ?

cfzd avatar Dec 21 '17 07:12 cfzd

@SeokjuLee ,I want to transfer my label which likes(x1,y1,x1',y1'),(x2,y2,x2',y2').My picture is 1280x1024 which different from yours 640x480,can you give me some advice on how transfer labels ? Or just change the width and height parameter in your vpg_annot_v1.m?

daixiaogang avatar Dec 21 '17 13:12 daixiaogang

@SeokjuLee ,because my label only have two points(x1,y1,x1',y1'),can not call the function ccvEvalBezSpline() in vpg_annot_v1.m.,so it got wrong. Do we must make spline first to make bouding box? can you give me some advice?

daixiaogang avatar Dec 21 '17 14:12 daixiaogang

@daixiaogang First, you should decide which input size to use, 640x480 or 1280x1024. If you want to use the latter one, please check the intermediate activation sizes after the branches. Basically the network is full convolutional so various sizes are applicable, but there might need some parameter tunings. Second, Is your label containing only two end points that represent one straight line for each lane? The label doesn't need to be always spline curve. First draw each straight line with two points on the image, then annotate grids through which the line passes.

SeokjuLee avatar Dec 22 '17 05:12 SeokjuLee

@SeokjuLee ,Thanks for your explaination,I have make the label like yours. I want to know more about your parameters to make anonation,such as gridsize(8) and thickness(2),should these parameters make some change to fit my picture?

daixiaogang avatar Dec 22 '17 06:12 daixiaogang

@SeokjuLee ,I have trained your net witch caltech-lane dataset.I want to konw how to test my results or output the cordinate(x1,y1,x2,y2...) of lanes, can you give me some advice?

daixiaogang avatar Dec 22 '17 06:12 daixiaogang

@daixiaogang Well, you should better not to change grid size (8) because that parameter depends on the rescaling factor between input (640x480) and output (multi-label:80x60) size. The thickness depends on the lane width. If the grid annotation covers the lane markings enough, I don't think you need to change it. About the demos and tests, use deploy.protxt and load models you've trained. You can visualize it through the multi-label and binary mask outputs.

SeokjuLee avatar Dec 22 '17 07:12 SeokjuLee

@SeokjuLee ,Thanks for your guide.But I still have some question for the demos and tests. I use the command "./build/tools/caffe test -model models/vpgnet-novp/deploy.prototxt -weights models/vpgnet-novp/snapshots/split_iter_82500.caffemodel -iterations 1 >>output.log 2>&1",I want to know how to input a picture and get the cordinate of the lane?Because deploy.prototxt did not indicate the input. Can you give me some advice?When I use the following python code to run deploy.prototxt,it got wrong:"[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 452:16: Message type "caffe.LayerParameter" has no field named "tiling_param"." ---------------------------------------------------code--------------------------------------------------------------------- import sys import numpy sys.path.append('/home/swjtu/daixiaogang/VPGNet/caffe/python') import caffe

WEIGHTS_FILE = './snapshots/split_iter_82500.caffemodel' DEPLOY_FILE = 'deploy.prototxt' IMAGE_SIZE = (480, 640) MEAN_VALUE = 128

caffe.set_mode_cpu() net = caffe.Net(DEPLOY_FILE, WEIGHTS_FILE, caffe.TEST) net.blobs['data'].reshape(1, 1, *IMAGE_SIZE)

transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape}) transformer.set_transpose('data', (2,0,1)) transformer.set_mean('data', numpy.array([MEAN_VALUE])) transformer.set_raw_scale('data', 255)

image_list = sys.argv[1]

with open(image_list, 'r') as f: for line in f.readlines(): filename = line[:-1] image = caffe.io.load_image(filename, False) transformed_image = transformer.preprocess('data', image) net.blobs['data'].data[...] = transformed_image output = net.forward() #score = output['pred'][0][0] print output ------------------------------------------end--------------------------------------------------------------------------

daixiaogang avatar Dec 25 '17 10:12 daixiaogang

@daixiaogang Try net.forward_all(data=np.array(transformed_image)); binary_mask = net.blobs['binary-mask'].data[0];

SeokjuLee avatar Dec 26 '17 07:12 SeokjuLee

@SeokjuLee ,it get wrong as the following message. ---------------------------------------------------------message----------------------------------------------------------- WARNING: Logging before InitGoogleLogging() is written to STDERR W1226 15:32:11.351035 8293 _caffe.cpp:122] DEPRECATION WARNING - deprecated use of Python interface W1226 15:32:11.351058 8293 _caffe.cpp:123] Use this instead (with the named "weights" parameter): W1226 15:32:11.351063 8293 _caffe.cpp:125] Net('deploy.prototxt', 1, weights='./snapshots/split_iter_82500.caffemodel') [libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 452:16: Message type "caffe.LayerParameter" has no field named "tiling_param". F1226 15:32:11.352567 8293 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: deploy.prototxt *** Check failure stack trace: *** ------------------------------------------------end--------------------------------------------------------------------- it turns out "caffe.LayerParameter" has no field named "tiling_param",but I have found this param in caffe.proto. It is very strange. Can you write a demo to use "deploy.prototxt"

daixiaogang avatar Dec 26 '17 07:12 daixiaogang

@daixiaogang Our pycaffe loading lines are almost same as yours. Could you check the pycaffe path? "tiling_param" is a customized caffe parameter. Check this link. https://github.com/alexgkendall/SegNet-Tutorial/issues/4

SeokjuLee avatar Dec 26 '17 07:12 SeokjuLee

@SeokjuLee ,thanks for your reminding,I have solved the path problem. But there turns out a new problem: "File "outpt.py", line 30, in binary_mask = net.blobs['binary-mask'].data[0]; KeyError: 'binary-mask'". there does not exisit 'binary-mask' in deploy.prototxt. And I try to use binary_mask = net.blobs['bb-output-tiled'].data[0]; it runs correctly(maybe),the ouput like [[[ 0.92812693 -2.9992063 0.93505991 ..., -2.99982905 0.95043111 -2.99812078] [ 0.94416535 -2.99826837 0.95043862 ..., -3.00105286 0.96007824 -2.99992943] [ 0.90508956 -3.00013518 0.94121945 ..., -2.99899626 0.95983672 -2.99970555] ...,

Is this the binary_mask ? should I add the mean_value,what's the size of meanvalue?

daixiaogang avatar Dec 26 '17 08:12 daixiaogang

@daixiaogang Please use updated model. The deploy model is updated. "binary-mask" and "multi-label" are softmax output.

SeokjuLee avatar Dec 26 '17 09:12 SeokjuLee

@SeokjuLee,can you explain how to visualize the ouput by "binary-mask" and "multi-label" ? I am reading your code (drive_data_layer.cpp and convert_driving_data.cpp),I am wondering are you trying to regression the gridbox(x,y,w,h)like object detetcion?so your output is the bouding box?

daixiaogang avatar Dec 26 '17 14:12 daixiaogang

@daixiaogang Could you please refer our paper? We elaborate the post-processing in Section 4.4 :)

SeokjuLee avatar Dec 26 '17 14:12 SeokjuLee

@SeokjuLee Hi, i have a problem with 'run 'train.sh' '. After following the four steps on the home page, i used sh train.sh to run 'train.sh'. But it had no response that could tell me how it is going. It was shown as this: root@root:~/dl/VPGNet/caffe/models/vpgnet-novp$ sh train.sh |(Here is a stationary cursor)

I used an NVIDIA GeForce GTX1080 to train it. But i get stuck here. I don't know if it's normal training time or i had problem with the code. I have waited for nearly an hour. Could you share me your training time or give me some advice to fix it. Thank you very much!

BTW: I can't use train.sh to open it because it display command not found. So i use sh and bash. Does it matter?

wsyzzz avatar Jan 03 '18 12:01 wsyzzz

@wsyzzz ,just try to use ./train.sh to run this code.

daixiaogang avatar Jan 03 '18 14:01 daixiaogang

@daixiaogang , thanks for your advice. However, it doesn't seem to make a difference. It still gets stuck and has no response. I will wait an hour to see it. Besides, the terminal isn't no response. I can input enter and string. And i use nvidia-smi to view GPU processes. It shows: | Processes: GPU Memory | | GPU PID Type Process name Usage | | 0 1129 C ../../build/tools/caffe 7149MiB |

wsyzzz avatar Jan 04 '18 02:01 wsyzzz

@wsyzzz Usually it works by just typing './train.sh' if the script file is set with executable mode. Have you tried to type command lines inside the script? If it doesn't respond, try to run with 'python debug_seokju.py' because we need to see at least one line of the errors.

SeokjuLee avatar Jan 04 '18 10:01 SeokjuLee

@SeokjuLee Thanks for your responds. What do you mean by 'type command lines inside the script'? I add an output in the head of train.sh. And it shows that train.sh is executed because it displays my output but still got stuck in ../../build/tools/caffe train --solver=./solver.prototxt >> ./output/output.log 2>&1. Besides, this process can be found in GPU processes by nvidia-smi just like the previous comment.

I tried run 'python debug_seokju.py'. It showed like that(too long to show here): ………………………… I0105 03:38:08.007841 29620 net.cpp:432] bb-num-pixel-normalization -> bb-masked-output-sn-nn I0105 03:38:08.007849 29620 net.cpp:155] Setting up bb-num-pixel-normalization I0105 03:38:08.007856 29620 net.cpp:163] Top shape: 10 4 120 160 (768000) I0105 03:38:08.007860 29620 layer_factory.hpp:76] Creating layer bb-loss I0105 03:38:08.007867 29620 net.cpp:110] Creating Layer bb-loss I0105 03:38:08.007872 29620 net.cpp:476] bb-loss <- bb-masked-output-sn-nn I0105 03:38:08.007879 29620 net.cpp:476] bb-loss <- bb-label-sn-nn I0105 03:38:08.007884 29620 net.cpp:432] bb-loss -> bb-loss I0105 03:38:08.007939 29620 net.cpp:155] Setting up bb-loss I0105 03:38:08.007946 29620 net.cpp:163] Top shape: (1) I0105 03:38:08.007951 29620 net.cpp:168] with loss weight 3 I0105 03:38:08.007958 29620 net.cpp:236] bb-loss needs backward computation. I0105 03:38:08.007963 29620 net.cpp:236] bb-num-pixel-normalization needs backward computation. I0105 03:38:08.007969 29620 net.cpp:236] bb-size-normalization needs backward computation. I0105 03:38:08.007975 29620 net.cpp:236] bb-prob-mask needs backward computation. I0105 03:38:08.007980 29620 net.cpp:240] type-acc does not need backward computation. I0105 03:38:08.007985 29620 net.cpp:236] type-loss needs backward computation. I0105 03:38:08.007992 29620 net.cpp:240] pixel-acc does not need backward computation. I0105 03:38:08.007997 29620 net.cpp:236] pixel-loss needs backward computation. I0105 03:38:08.008003 29620 net.cpp:236] type-conv-tiled_type-tile_0_split needs backward computation. I0105 03:38:08.008008 29620 net.cpp:236] type-tile needs backward computation. I0105 03:38:08.008013 29620 net.cpp:236] bb-tile needs backward computation. I0105 03:38:08.008018 29620 net.cpp:236] pixel-conv-tiled_pixel-tile_0_split needs backward computation. I0105 03:38:08.008023 29620 net.cpp:236] pixel-tile needs backward computation. I0105 03:38:08.008028 29620 net.cpp:236] type-conv needs backward computation. I0105 03:38:08.008031 29620 net.cpp:236] pixel-conv needs backward computation. I0105 03:38:08.008036 29620 net.cpp:236] bb-output needs backward computation. I0105 03:38:08.008041 29620 net.cpp:236] drop7c needs backward computation. I0105 03:38:08.008046 29620 net.cpp:236] relu7c needs backward computation. I0105 03:38:08.008050 29620 net.cpp:236] L6c needs backward computation. I0105 03:38:08.008055 29620 net.cpp:236] drop7b needs backward computation. I0105 03:38:08.008059 29620 net.cpp:236] relu7b needs backward computation. I0105 03:38:08.008064 29620 net.cpp:236] L6b needs backward computation. I0105 03:38:08.008067 29620 net.cpp:236] drop7a needs backward computation. I0105 03:38:08.008071 29620 net.cpp:236] relu7a needs backward computation. I0105 03:38:08.008075 29620 net.cpp:236] L6a needs backward computation. I0105 03:38:08.008080 29620 net.cpp:236] L5_drop6_0_split needs backward computation. I0105 03:38:08.008085 29620 net.cpp:236] drop6 needs backward computation. I0105 03:38:08.008090 29620 net.cpp:236] relu6 needs backward computation. I0105 03:38:08.008093 29620 net.cpp:236] L5 needs backward computation. I0105 03:38:08.008098 29620 net.cpp:236] pool5 needs backward computation. I0105 03:38:08.008102 29620 net.cpp:236] relu5 needs backward computation. I0105 03:38:08.008106 29620 net.cpp:236] L4 needs backward computation. I0105 03:38:08.008111 29620 net.cpp:236] relu4 needs backward computation. I0105 03:38:08.008116 29620 net.cpp:236] L3 needs backward computation. I0105 03:38:08.008121 29620 net.cpp:236] relu3 needs backward computation. I0105 03:38:08.008124 29620 net.cpp:236] L2 needs backward computation. I0105 03:38:08.008128 29620 net.cpp:236] pool2 needs backward computation. I0105 03:38:08.008133 29620 net.cpp:236] norm2 needs backward computation. I0105 03:38:08.008137 29620 net.cpp:236] relu2 needs backward computation. I0105 03:38:08.008142 29620 net.cpp:236] L1 needs backward computation. I0105 03:38:08.008147 29620 net.cpp:236] pool1 needs backward computation. I0105 03:38:08.008152 29620 net.cpp:236] norm1 needs backward computation. I0105 03:38:08.008155 29620 net.cpp:236] relu1 needs backward computation. I0105 03:38:08.008159 29620 net.cpp:236] L0 needs backward computation. I0105 03:38:08.008164 29620 net.cpp:240] bb-label-num-pixel-normalization does not need backward computation. I0105 03:38:08.008170 29620 net.cpp:240] bb-label-size-normalization does not need backward computation. I0105 03:38:08.008177 29620 net.cpp:240] norm-block_norm-block_0_split does not need backward computation. I0105 03:38:08.008182 29620 net.cpp:240] norm-block does not need backward computation. I0105 03:38:08.008189 29620 net.cpp:240] size-block_size-block_0_split does not need backward computation. I0105 03:38:08.008194 29620 net.cpp:240] size-block does not need backward computation. I0105 03:38:08.008200 29620 net.cpp:240] pixel-block does not need backward computation. I0105 03:38:08.008208 29620 net.cpp:240] norm-label_slice-label_3_split does not need backward computation. I0105 03:38:08.008213 29620 net.cpp:240] size-label_slice-label_2_split does not need backward computation. I0105 03:38:08.008219 29620 net.cpp:240] pixel-label_slice-label_0_split does not need backward computation. I0105 03:38:08.008225 29620 net.cpp:240] slice-label does not need backward computation. I0105 03:38:08.008230 29620 net.cpp:240] type_data_2_split does not need backward computation. I0105 03:38:08.008236 29620 net.cpp:240] data does not need backward computation. I0105 03:38:08.008240 29620 net.cpp:283] This network produces output bb-loss I0105 03:38:08.008244 29620 net.cpp:283] This network produces output pixel-acc I0105 03:38:08.008249 29620 net.cpp:283] This network produces output pixel-loss I0105 03:38:08.008255 29620 net.cpp:283] This network produces output type-acc I0105 03:38:08.008258 29620 net.cpp:283] This network produces output type-loss I0105 03:38:08.008304 29620 net.cpp:297] Network initialization done. I0105 03:38:08.008309 29620 net.cpp:298] Memory required for data: 1391827220 I0105 03:38:08.008486 29620 solver.cpp:65] Solver scaffolding done. /home/dl/VPGNet/caffe/models/vpgnet-novp/debug_seokju.py(39)() -> for _ in range(100): (Pdb)

wsyzzz avatar Jan 04 '18 11:01 wsyzzz

@SeokjuLee When I run train.sh, the output.log print

F0112 09:20:56.458614 3586 syncedmem.cpp:58] Check failed: error == cudaSuccess (2 vs. 0) out of memory *** Check failure stack trace: *** @ 0x7fc16fc365cd google::LogMessage::Fail() @ 0x7fc16fc38433 google::LogMessage::SendToLog() @ 0x7fc16fc3615b google::LogMessage::Flush() @ 0x7fc16fc38e1e google::LogMessageFatal::~LogMessageFatal() @ 0x7fc17034b840 caffe::SyncedMemory::to_gpu() @ 0x7fc17034a829 caffe::SyncedMemory::mutable_gpu_data() @ 0x7fc170362af2 caffe::Blob<>::mutable_gpu_data() @ 0x7fc1703aaa98 caffe::PoolingLayer<>::Forward_gpu() @ 0x7fc170372d22 caffe::Net<>::ForwardFromTo() @ 0x7fc170372e47 caffe::Net<>::ForwardPrefilled() @ 0x7fc1703989dd caffe::Solver<>::Step() @ 0x7fc17039956a caffe::Solver<>::Solve() @ 0x40bf6b train() @ 0x408688 main @ 0x7fc16f2b3830 __libc_start_main @ 0x408e29 _start @ (nil) (unknown)

What does it mean? It happend too, When I train a one train picture , one test picture.

My Computer have
RAM 32G GTX 1060 Cuda 8.0 Not use Cudnn

I solved my Problem. It was batch_size. from 64 batch_size to 5.

but I don't know how to use deploy.prototxt? (I am beginner in caffe) so please help me. I searched the site like google or the other things. and I write the code like "build/examples/cpp_classification/classification.bin models/vpgnet-novp/deploy.prototxt models/vpgnet-novp/snapshots/split_iter_500.caffemodel models/vpgnet-novp/driving_mean_train.binaryproto ./f00000.jpg"

but not happend. Plz help me

ddori avatar Jan 12 '18 00:01 ddori