SegNet-Tutorial
SegNet-Tutorial copied to clipboard
Iwant to train 6 classes, so I change number of output. BUT I got error !!! Please help me ~
hi~ I want to train six classes . So I changed the annotation (0~6) and changed only num_output :6 and ignore_label :6 .
This is my end of segnet_train.prototxt .
I just change num_oupput and ignored label.
layer {
bottom: "conv1_2_D"
top: "conv1_1_D"
name: "conv1_1_D"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
}
num_output: 6
pad: 1
kernel_size: 3
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "conv1_1_D"
bottom: "label"
top: "loss"
softmax_param {engine: CAFFE}
loss_param: {
weight_by_label_freqs: true
ignore_label: 6
class_weighting: 0.9886
class_weighting: 0.6415
class_weighting: 16.0338
class_weighting: 0.4978
class_weighting: 1.0000
class_weighting: 1.3441
} } layer { name: "accuracy" type: "Accuracy" bottom: "conv1_1_D" bottom: "label" top: "accuracy" top: "per_class_accuracy" }
I1026 13:19:37.835263 2388 solver.cpp:266] Learning Rate Policy: step *** Error in `./caffe-segnet-multi-gpu/build/tools/caffe': malloc(): memory corruption (fast): 0x0000000008213fa0 *** *** Aborted at 1477455578 (unix time) try "date -d @1477455578" if you are using GNU date *** PC: @ 0x7fdc1b426c37 (unknown) *** SIGABRT (@0x3e800000954) received by PID 2388 (TID 0x7fdc1d584780) from PID 2388; stack trace: *** @ 0x7fdc1b426cb0 (unknown) @ 0x7fdc1b426c37 (unknown) @ 0x7fdc1b42a028 (unknown) @ 0x7fdc1b4632a4 (unknown) @ 0x7fdc1b46dff7 (unknown) @ 0x7fdc1b470cf4 (unknown) @ 0x7fdc1b4726c0 (unknown) @ 0x7fdc1c059dad (unknown) @ 0x7fdc1ce006fd std::vector<>::_M_insert_aux() @ 0x7fdc1ce028ac caffe::AccuracyLayer<>::Forward_cpu() @ 0x7fdc1cd46a51 caffe::Net<>::ForwardFromTo() @ 0x7fdc1cd46dc7 caffe::Net<>::ForwardPrefilled() @ 0x7fdc1cd6bf19 caffe::Solver<>::Step() @ 0x7fdc1cd6c743 caffe::Solver<>::Solve() @ 0x408ebb train() @ 0x4069b1 main @ 0x7fdc1b411f45 (unknown) @ 0x40710c (unknown) @ 0x0 (unknown)
I got this error and I couldn't find where the error is. I used caffe-segnet-multi-gpu version, but I got same error when I used original caffe-segnet. Please help me to train 6 class.
And this is log of debugging mode. I1026 13:33:03.905791 9536 solver.cpp:265] Solving VGG_ILSVRC_16_layer I1026 13:33:03.905797 9536 solver.cpp:266] Learning Rate Policy: step I1026 13:33:03.972322 9539 dense_image_data_layer.cpp:234] Prefetch batch: 62 m s. I1026 13:33:03.972370 9539 dense_image_data_layer.cpp:235] Read time: 48.0 02 ms. I1026 13:33:03.972384 9539 dense_image_data_layer.cpp:236] Transform time: 14.1 73 ms. F1026 13:33:04.347821 9536 accuracy_layer.cpp:72] Check failed: label_value < n um_labels (6 vs. 6) *** Check failure stack trace: *** @ 0x7f6d79d30daa (unknown) @ 0x7f6d79d30ce4 (unknown) @ 0x7f6d79d306e6 (unknown) @ 0x7f6d79d33687 (unknown) @ 0x7f6d7a594bcd caffe::AccuracyLayer<>::Forward_cpu() @ 0x7f6d7a55a90d caffe::Layer<>::Forward_gpu() @ 0x41a686 caffe::Layer<>::Forward() @ 0x7f6d7a4be75d caffe::Net<>::ForwardFromTo() @ 0x7f6d7a4be525 caffe::Net<>::ForwardPrefilled() @ 0x7f6d7a4be8f0 caffe::Net<>::Forward() @ 0x7f6d7a4bf2d5 caffe::Net<>::ForwardBackward() @ 0x7f6d7a4eaa2b caffe::Solver<>::Step() @ 0x7f6d7a4ea4b7 caffe::Solver<>::Solve() @ 0x4154cb train() @ 0x4175fa main @ 0x7f6d78e32f45 (unknown) @ 0x414369 (unknown) @ (nil) (unknown) Aborted (core dumped)
I also get the same error periodically (sometimes - other times training would start as normal)! If you look at the stacktrace, there is (or might be) a problem with the Accuracy layer at the end of the model file. I commented this out and then this particular problem went away.
The strange thing is that I also get a CUBLAS error periodically as well:
F1104 13:25:18.790753 8281 math_functions.cu:123] Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERROR *** Check failure stack trace: *** @ 0x7fe05da2fdaa (unknown) @ 0x7fe05da2fce4 (unknown) @ 0x7fe05da2f6e6 (unknown) @ 0x7fe05da32687 (unknown) @ 0x7fe05de93e7b caffe::caffe_gpu_asum<>() @ 0x7fe05de90e5f caffe::SoftmaxWithLossLayer<>::Backward_gpu() @ 0x7fe05dd4002c caffe::Net<>::BackwardFromTo() @ 0x7fe05dd40271 caffe::Net<>::Backward() @ 0x7fe05de49e5d caffe::Solver<>::Step() @ 0x7fe05de4a77f caffe::Solver<>::Solve() @ 0x4086c8 train() @ 0x406c61 main @ 0x7fe05cf41ec5 (unknown) @ 0x40720d (unknown) @ (nil) (unknown)
Other times I am able to start training without any problems.
I am training with 9 classes on an Ubuntu 14.04 machine with a Titan X GPU.
@beejisbrigit Hi
I am facing the same error. Can you please help me by telling what exactly you did to clear that error?
I am also facing the same issue @alexgkendall