ALFNet icon indicating copy to clipboard operation
ALFNet copied to clipboard

run train.py occur error

Open zhenyezi opened this issue 6 years ago • 24 comments

when I run train.py ,I run into some error File "/ghome/zhenye/ALFNet-master/keras_alfnet/data_generators.py", line 7, in from .utils.cython_bbox import bbox_overlaps ImportError: No module named cython_bbox besides, File "/ghome/zhenye/ALFNet-master/keras_alfnet/data_generators.py", line 8, in from .utils.bbox import box_op ImportError: No module named bbox I want to ask the author whether miss the two functions or I miss some important operations?

zhenyezi avatar Sep 06 '18 08:09 zhenyezi

I have the same problem.

zhangxydlut avatar Sep 21 '18 12:09 zhangxydlut

@zhenyezi @zhenyezi num of training samples: 2112 Using TensorFlow backend. I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally Traceback (most recent call last): File "train.py", line 35, in from keras_alfnet.model.model_1step import Model_1step File "/var/docker/share/madono/summer/ALFNet/keras_alfnet/model/model_1step.py", line 1, in from .base_model import Base_model File "/var/docker/share/madono/summer/ALFNet/keras_alfnet/model/base_model.py", line 2, in from keras_alfnet import data_generators File "/var/docker/share/madono/summer/ALFNet/keras_alfnet/data_generators.py", line 7, in from .utils.cython_bbox import bbox_overlaps ImportError: No module named cython_bbox I also have similar problems...

MADONOKOUKI avatar Sep 23 '18 07:09 MADONOKOUKI

git clone --recursive https://github.com/rbgirshick/py-faster-rcnn.git and cd py-faster-rcnn/lib and make then copy the utils document from py-faster-rcnn to the utils document from ALFNet then uncomment all "from .utils.bbox import box_op" and change "box_op" to "bbox_overlaps" it works for me...

pnnnnnnn avatar Sep 25 '18 08:09 pnnnnnnn

@pnnnnnnn , Is the trained results right?

yongqiangzhang1 avatar Sep 27 '18 09:09 yongqiangzhang1

@pnnnnnnn , Is the trained results right?

still training, for now i've trained for 70 epochs and the total loss dropped from 0.66 to 0.19

pnnnnnnn avatar Sep 27 '18 09:09 pnnnnnnn

@pnnnnnnn , what is the meaning of "uncomment all "from .utils.bbox import box_op" and change "box_op" to "bbox_overlaps""? comment or uncomment?

yongqiangzhang1 avatar Sep 27 '18 09:09 yongqiangzhang1

@pnnnnnnn , what is the meaning of "uncomment all "from .utils.bbox import box_op" and change "box_op" to "bbox_overlaps""? comment or uncomment?

oh, sorry, it's "comment" comment all "from .utils.bbox import box_op" change the remaining "box_op" to "bbox_overlaps"

pnnnnnnn avatar Sep 27 '18 09:09 pnnnnnnn

@pnnnnnnn do you check the box_op and bbox_overlaps have the same function?

yongqiangzhang1 avatar Sep 27 '18 09:09 yongqiangzhang1

@pnnnnnnn do you check the box_op and bbox_overlaps have the same function?

there's no box_op function

pnnnnnnn avatar Sep 27 '18 09:09 pnnnnnnn

@yongqiangzhang1 @zhangxydlut @MADONOKOUKI @pnnnnnnn Please try this compiled document utils.zip

VideoObjectSearch avatar Sep 27 '18 14:09 VideoObjectSearch

"No module named cython_bbox" and "No module named bbox" are solved by your compiled utils.zip files. But there is a new error from nms.gpu_nms import gpu_nms; ImportError: No module named gpu_nms, can you compile the nms and upload the compiled nms document. Thanks.

yongqiangzhang1 avatar Sep 27 '18 14:09 yongqiangzhang1

@yongqiangzhang1 You can have a try. nms.zip

VideoObjectSearch avatar Sep 28 '18 00:09 VideoObjectSearch

nms works, thank you very much.

yongqiangzhang1 avatar Sep 28 '18 03:09 yongqiangzhang1

@pnnnnnnn , Is the trained results right?

still training, for now i've trained for 70 epochs and the total loss dropped from 0.66 to 0.19

Did you get the same MR as the paper?

Chen94yue avatar Oct 10 '18 06:10 Chen94yue

@pnnnnnnn , Is the trained results right?

still training, for now i've trained for 70 epochs and the total loss dropped from 0.66 to 0.19

Did you get the same MR as the paper?

not yet(?), i've trained for 200 epochs(2k iterations per epoch, batchsize 4, gpu 1050ti) and got 16.53 on the best model, and now i'm decreasing the lr from 1e-4 to 1e-5 for 100 epochs

pnnnnnnn avatar Oct 17 '18 03:10 pnnnnnnn

@pnnnnnnn , Is the trained results right?

still training, for now i've trained for 70 epochs and the total loss dropped from 0.66 to 0.19

Did you get the same MR as the paper?

not yet(?), i've trained for 200 epochs(2k iterations per epoch, batchsize 4, gpu 1050ti) and got 16.53 on the best model, and now i'm decreasing the lr from 1e-4 to 1e-5 for 100 epochs

Hi, still the question, did you get the same MR as the paper? The best score I have got is 16.33. A BIG GAP.

youtang1993 avatar Nov 03 '18 13:11 youtang1993

@pnnnnnnn , Is the trained results right?

still training, for now i've trained for 70 epochs and the total loss dropped from 0.66 to 0.19

Did you get the same MR as the paper?

not yet(?), i've trained for 200 epochs(2k iterations per epoch, batchsize 4, gpu 1050ti) and got 16.53 on the best model, and now i'm decreasing the lr from 1e-4 to 1e-5 for 100 epochs

Hi, still the question, did you get the same MR as the paper? The best score I have got is 16.33. A BIG GAP.

the best i've got is 13.18, maybe it's because my small batchsize(only 4) that i can't reach 12.01

pnnnnnnn avatar Nov 05 '18 01:11 pnnnnnnn

hi, when i run the test.py, also have the same problem. i use python3.5 @VideoObjectSearch Traceback (most recent call last): File "test.py", line 32, in from keras_alfnet.model.model_1step import Model_1step File "/home/ou/workplace/ALFNet/keras_alfnet/model/model_1step.py", line 1, in from .base_model import Base_model File "/home/ou/workplace/ALFNet/keras_alfnet/model/base_model.py", line 2, in from keras_alfnet import data_generators File "/home/ou/workplace/ALFNet/keras_alfnet/data_generators.py", line 7, in from .utils.cython_bbox import bbox_overlaps ImportError: /home/ou/workplace/ALFNet/keras_alfnet/utils/cython_bbox.so: undefined symbol: _Py_ZeroStruct

ou525 avatar Dec 08 '18 08:12 ou525

@yongqiangzhang1 You can have a try. nms.zip

hi, @VideoObjectSearch , when i use the nms.zip, i have the problem: ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory

i use CUDA9.0, how can i compile to make it work?

m1nt07 avatar Jan 02 '19 12:01 m1nt07

@yongqiangzhang1 You can have a try. nms.zip hi,when i use the nms.zip,i solve the problem "ImportError: No module named gpu_nms",but the new problem comes: Traceback (most recent call last): File "train.py", line 40, in from keras_alfnet.model.model_2step import Model_2step File "/home/by/ma/ALFNet-master/keras_alfnet/model/model_2step.py", line 7, in from keras_alfnet import bbox_process File "/home/by/ma/ALFNet-master/keras_alfnet/bbox_process.py", line 7, in from nms_wrapper import nms File "/home/by/ma/ALFNet-master/keras_alfnet/nms_wrapper.py", line 9, in from nms.cpu_nms import cpu_nms ImportError: /home/by/ma/ALFNet-master/keras_alfnet/nms/cpu_nms.so: undefined symbol: PyFPE_jbuf how can i solve it? Thank you.

xiaoshang123 avatar Jan 23 '19 07:01 xiaoshang123

@yongqiangzhang1 You can have a try. nms.zip

hi, @VideoObjectSearch , when i use the nms.zip, i have the problem: ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory

i use CUDA9.0, how can i compile to make it work?

I meet the same problem, do you find the cuda 9 version of nms?

weizheliu avatar Feb 06 '19 15:02 weizheliu

@yongqiangzhang1 You can have a try. nms.zip

hi, @VideoObjectSearch , when i use the nms.zip, i have the problem: ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory i use CUDA9.0, how can i compile to make it work?

I meet the same problem, do you find the cuda 9 version of nms?

you can try nms.zip

whitenightwu avatar Mar 19 '19 10:03 whitenightwu

hi, when i run the test.py, also have the same problem. i use python3.5 @VideoObjectSearch Traceback (most recent call last): File "test.py", line 32, in from keras_alfnet.model.model_1step import Model_1step File "/home/ou/workplace/ALFNet/keras_alfnet/model/model_1step.py", line 1, in from .base_model import Base_model File "/home/ou/workplace/ALFNet/keras_alfnet/model/base_model.py", line 2, in from keras_alfnet import data_generators File "/home/ou/workplace/ALFNet/keras_alfnet/data_generators.py", line 7, in from .utils.cython_bbox import bbox_overlaps ImportError: /home/ou/workplace/ALFNet/keras_alfnet/utils/cython_bbox.so: undefined symbol: _Py_ZeroStruct

hi, i meet the same question, have you solved it?

xiefeiwhu avatar Apr 15 '19 09:04 xiefeiwhu

@yongqiangzhang1 @pnnnnnnn follow the code I can train 150 epochs, but when i run the test.py using the train result resnet_e3_l1.15433712553.hdf5 , I cannot get test result, the val_det.txt is empty, why?

nankeermeng avatar Jun 09 '19 06:06 nankeermeng