object-detection.torch icon indicating copy to clipboard operation
object-detection.torch copied to clipboard

Problem in main (refactoring branch)

Open hdmetor opened this issue 9 years ago • 10 comments

th main.lua -algo RCNN -backend cudnn gives me the following error on the refactoring branch:

nn.CrossEntropyCriterion
==> Converting model to CUDA
/home/ubuntu/torch/install/bin/luajit: /home/ubuntu/object-detection.torch/data.lua:28: attempt to index field 'algo' (a nil value)
stack traceback:
    /home/ubuntu/object-detection.torch/data.lua:28: in main chunk
    [C]: in function 'dofile'
    main.lua:33: in main chunk
    [C]: in function 'dofile'
    ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x00406670

hdmetor avatar Nov 18 '15 00:11 hdmetor

Hi, Thanks for pointing this out. For the moment, if you want to train/test using RCNN, check the examples/train_test_rcnn.lua . I still need to provide a model for it though (or a way to load pre-trained models) I was not paying much attention to the main.lua script because I was thinking about only having one file for each framework in the examples folder, but I'll think about it again. I'll leave this open until I come up with a solution (either remove the main.lua and point to the examples, or fix it).

fmassa avatar Nov 18 '15 07:11 fmassa

th train_test_rcnn.lua gives me the error: Using GPU mode on device 1 Using fixed seed: 1 /usr/bin/luajit: /home/ekanshv/models/zeiler.lua:30: attempt to call local 'spatialconv' (a nil value) stack traceback: /home/ekanshv/models/zeiler.lua:30: in function 'createModel' train_test_rcnn.lua:53: in main chunk [C]: in function 'dofile' /usr/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670

Ethiral avatar Apr 13 '16 04:04 Ethiral

@Ethiral I've added a quick fix for your problem in commit https://github.com/fmassa/object-detection.torch/commit/a0a4a51a6983b09f55d71cb48c66410dcf406fb0 . For now, you need to add the pre-trained model path in the argument of the script

I will revamp this repo soon, with simple examples and packaging it as a package. Plus, I have some improvements on the code that I still need to push. I'll hopefully do it in the coming weeks.

fmassa avatar Apr 13 '16 12:04 fmassa

hi,th main.lua gives me the error: ==> Preparing BatchProvider for validation
/home/wulong/torch/install/bin/luajit: ./DataSetPascal.lua:222: Need to specify the bounding boxes file stack traceback: [C]: in function 'assert' ./DataSetPascal.lua:222: in function 'loadROIDB' ./DataSetPascal.lua:315: in function 'attachProposals' ./BatchProvider.lua:73: in function 'setupData' /home/wulong/object-detection.torch/data.lua:96: in main chunk [C]: in function 'dofile' main.lua:33: in main chunk [C]: in function 'dofile' ...long/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670

so how can I get the bounding boxes file and where should I put it ? Thank you.

longwoo avatar Aug 02 '16 03:08 longwoo

@longwoo you can use whatever region proposal algorithm you want. For example, you can use Selective Search, and the link for downloading the proposals can be found here. This code supposes the same bounding box format as the one in the link I just sent. You can put them anywhere, but the default location of main.lua in the current master branch is data/selective_search_data , as can be seen in this part of the code.

fmassa avatar Aug 02 '16 03:08 fmassa

Thanks. But after done that (downloading the Selective Search files you mentioned and put into the right place) and run th main.lua -algo SPP -gpu 2 -seed 1 ,it gives me the error:

=> Creating model from file: models/zeiler.lua  
=> Criterion    
==> Converting model to CUDA    
Loading train metadata from cache   
Loading test metadata from cache    
Preparing conv5 features for VOC2007 trainval   
 [========== 5011/5011 =========>]  Tot: 2s104ms | Step: 0ms     
Preparing conv5 features for VOC2007 test   
Iteration: 1/300    
==> Preparing Batch Data    
 [========== 3476/3476 ====>]  Tot: 1m11s | Step: 21ms      
==> Training zeiler,seed=1  
 [========== 500/500 ===========>]  Tot: 21s830ms | Step: 42ms   
==> Training Error: 0.64886541676521    
ConfusionMatrix:
 + average row correct: 36.375306085462% 
 + average rowUcol correct (VOC measure): 29.776913725904% 
 + global correct: 83.2015625%
/home/wulong/torch/install/bin/luajit: bad argument #1 to '?' (must be strictly positive at /home/wulong/torch/pkg/torch/lib/TH/generic/THTensorMath.c:1420)
stack traceback:
    [C]: at 0x7f7340d95d20
    [C]: in function 'randperm'
    ./BatchProvider.lua:114: in function 'permuteIdx'
    ./BatchProvider.lua:245: in function 'getBatch'
    ./Tester.lua:33: in function 'validate'
    main.lua:76: in main chunk
    [C]: in function 'dofile'
    ...long/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

It seems that I still missing something.

longwoo avatar Aug 03 '16 03:08 longwoo

@longwoo It seems that you have not provided a test dataset. If you download Pascal VOC test dataset and put it in the datasets/VOCdevkit folder, it should work.

fmassa avatar Aug 03 '16 03:08 fmassa

In the refactoring branch I run th train_test_rcnn.lua,it gives

wulong@PVG-Dsk-004:~/object-detection.torch-refactoring$ th train_test_rcnn.lua 
-- ignore option gpu    
-- ignore option name   
-- ignore option modelpath  
-- ignore option numthreads 
[program started on Mon Aug  8 15:58:59 2016]   
[command line arguments]    
gpu 2   
seed    1   
name    rcnn-example    
save_step   100 
lr  0.001   
modelpath   /home/wulong/object-detection.torch-refactoring/data/models/frcnn_alexnet.t7    
numthreads  6   
num_iter    400 
disp_iter   1   
lr_step 300 
[----------------------]    
Using GPU mode on device 2  
Using fixed seed: 1 
/home/wulong/torch/install/bin/luajit: train_test_rcnn.lua:76: attempt to call method 'type' (a nil value)
stack traceback:
    train_test_rcnn.lua:76: in main chunk
    [C]: in function 'dofile'
    ...long/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

Now I am using "frcnn_alexnet.t7" as pre-trained model. And I also tried "Zeiler_imagenet_weights.mat". It seems not right. So where can I get the exact pre-trained model? And if I want to train fast-rcnn,should I use different one? Thank you.

longwoo avatar Aug 08 '16 08:08 longwoo

@longwoo if you want to train Fast R-CNN, you should use a model similar to frcnn_alexnet.t7, which is already finetuned for Pascal. About the error you are seeing, I couldn't find in the current code a matching line that corresponds to your error in line 76. Are you sure that the model was loaded properly?

Also, I'll be soon pushing a new repo which uses this one, it will hopefully be a good starting point for using this code.

fmassa avatar Aug 09 '16 03:08 fmassa

I'm looking forward to it. ^.^

longwoo avatar Aug 09 '16 05:08 longwoo