caffe icon indicating copy to clipboard operation
caffe copied to clipboard

training error: Data layer prefetch queue empty

Open Jimjipeng opened this issue 6 years ago • 14 comments

I0316 09:00:59.149740 2416 blocking_queue.cpp:50] Data layer prefetch queue empty

Jimjipeng avatar Mar 16 '18 01:03 Jimjipeng

It is not a problem, just a warning. It means data loading is slower than net forward, so net has to wait for data IO to prepare next batch.

ujsyehao avatar Mar 16 '18 03:03 ujsyehao

Data layer prefetch queue empty After that,program stands still forever.....@ujsyehao

Jimjipeng avatar Mar 19 '18 06:03 Jimjipeng

I met the same problem, have you solved it?

jmt330 avatar Mar 21 '18 06:03 jmt330

gpu power lower , using another device

Jimjipeng avatar Apr 27 '18 02:04 Jimjipeng

I've come across the same problem. I'm using a Titan V and CUDA 9.0.

alberto139 avatar Jul 06 '18 21:07 alberto139

I met the same problem today. I'm using 1080 Ti and CUDA 9.2 on windows 10. Who could tell me the solution? thx.

FlourishingLN avatar Jul 10 '18 15:07 FlourishingLN

@Jimjipeng I don't think its GPU power problem . I have switch between 2 devices but all the same behaviour which indicates this and still forever, and CPU usage 100%.

However, my another yolov3 code which also using caffe does not have this problem

lucasjinreal avatar Feb 18 '19 10:02 lucasjinreal

I meet the same problem using 1080Ti & CUDA 10.0, but there is no problem when using gtx1060 & CUDA 8.0 I don't think its GPU power problem too, The problem I found may be the annotated_data_layer

Pinnh avatar Mar 05 '19 01:03 Pinnh

I get Resolved, in src/caffe/util/sampler.cpp

caffe_rng_uniform(1, 0.f, 1 - bbox_width, &w_off); caffe_rng_uniform(1, 0.f, 1 - bbox_height, &h_off);

caffe_rng_uniform will get block, when bbox_width or bbox_height near 1.0 , (1 - bbox_width) will less than 0.f

I change the SampleBBox function, get success

void SampleBBox(const Sampler& sampler, NormalizedBBox* sampled_bbox) { // Get random scale. CHECK_GE(sampler.max_scale(), sampler.min_scale()); CHECK_GT(sampler.min_scale(), 0.); CHECK_LE(sampler.max_scale(), 1.); float scale; caffe_rng_uniform(1, sampler.min_scale(), sampler.max_scale(), &scale); // Get random aspect ratio. CHECK_GE(sampler.max_aspect_ratio(), sampler.min_aspect_ratio()); CHECK_GT(sampler.min_aspect_ratio(), 0.); CHECK_LT(sampler.max_aspect_ratio(), FLT_MAX); float aspect_ratio; caffe_rng_uniform(1, sampler.min_aspect_ratio(), sampler.max_aspect_ratio(), &aspect_ratio); aspect_ratio = std::max(aspect_ratio, std::pow(scale, 2.)); aspect_ratio = std::min(aspect_ratio, 1 / std::pow(scale, 2.)); // Figure out bbox dimension. float bbox_width = scale * sqrt(aspect_ratio); float bbox_height = scale / sqrt(aspect_ratio); if(bbox_width>=1.0){ bbox_width=1.0; } if(bbox_height>=1.0){ bbox_height=1.0; } // Figure out top left coordinates. float w_off, h_off; caffe_rng_uniform(1, 0.f, 1.0f - bbox_width, &w_off); caffe_rng_uniform(1, 0.f, 1.0f - bbox_height, &h_off); sampled_bbox->set_xmin(w_off); sampled_bbox->set_ymin(h_off); sampled_bbox->set_xmax(w_off + bbox_width); sampled_bbox->set_ymax(h_off + bbox_height); }

Pinnh avatar Mar 05 '19 04:03 Pinnh

after tracing this error i finally narrowed it down, this one caused by zero dimension image (either zero width, zero height or both, some of them caused from early casting from float to int), there are 3 methods which trigger this on caffe, DataTransformer::CropImage, DataTransformer::ExpandImage and SampleBBox on sampler.cpp, after i fixed them now the training process works fine on Nvidia TX2 hardware

kuriel07 avatar Mar 13 '19 13:03 kuriel07

@Pinnh fix my problem, thanks

huangmin9966 avatar May 17 '19 06:05 huangmin9966

@Pinnh fix my problem too, thanks.

RuaYahya avatar Nov 16 '19 23:11 RuaYahya

I got this problem too, and in my situation, it is because I'm trying to run 2 different training with caffe, the first one runs normally, and the second one will stuck at Data layer prefetch queue empty. So I solved this problem by using another compiled caffe to run the second training, and it works. It may be a special case, and I haven't figure out the reason, just in case that somebody meet the same problem with me.

NarcissusInMirror avatar Nov 23 '19 09:11 NarcissusInMirror

I get Resolved, in src/caffe/util/sampler.cpp

caffe_rng_uniform(1, 0.f, 1 - bbox_width, &w_off); caffe_rng_uniform(1, 0.f, 1 - bbox_height, &h_off);

caffe_rng_uniform will get block, when bbox_width or bbox_height near 1.0 , (1 - bbox_width) will less than 0.f

I change the SampleBBox function, get success

void SampleBBox(const Sampler& sampler, NormalizedBBox* sampled_bbox) { // Get random scale. CHECK_GE(sampler.max_scale(), sampler.min_scale()); CHECK_GT(sampler.min_scale(), 0.); CHECK_LE(sampler.max_scale(), 1.); float scale; caffe_rng_uniform(1, sampler.min_scale(), sampler.max_scale(), &scale); // Get random aspect ratio. CHECK_GE(sampler.max_aspect_ratio(), sampler.min_aspect_ratio()); CHECK_GT(sampler.min_aspect_ratio(), 0.); CHECK_LT(sampler.max_aspect_ratio(), FLT_MAX); float aspect_ratio; caffe_rng_uniform(1, sampler.min_aspect_ratio(), sampler.max_aspect_ratio(), &aspect_ratio); aspect_ratio = std::max(aspect_ratio, std::pow(scale, 2.)); aspect_ratio = std::min(aspect_ratio, 1 / std::pow(scale, 2.)); // Figure out bbox dimension. float bbox_width = scale * sqrt(aspect_ratio); float bbox_height = scale / sqrt(aspect_ratio); if(bbox_width>=1.0){ bbox_width=1.0; } if(bbox_height>=1.0){ bbox_height=1.0; } // Figure out top left coordinates. float w_off, h_off; caffe_rng_uniform(1, 0.f, 1.0f - bbox_width, &w_off); caffe_rng_uniform(1, 0.f, 1.0f - bbox_height, &h_off); sampled_bbox->set_xmin(w_off); sampled_bbox->set_ymin(h_off); sampled_bbox->set_xmax(w_off + bbox_width); sampled_bbox->set_ymax(h_off + bbox_height); }

my issue is not fixed....

ustcben avatar Apr 07 '20 12:04 ustcben