FCIS icon indicating copy to clipboard operation
FCIS copied to clipboard

Error while training with coco dataset: h5py unable to open file

Open PardoAlejo opened this issue 7 years ago • 3 comments

I'm trying to run: python experiments/fcis/fcis_end2end_train_test.py --cfg experiments/fcis/cfgs/resnet_v1_101_coco_fcis_end2end_ohem.yaml

It trains until batch [1000] and then I get the following error:

Epoch[0] Batch [1000] Speed: 2.92 samples/sec Train-RPNAcc=0.873100, RPNLogLoss=0.307023, RPNL1Loss=0.167374, FCISAcc=0.716729, FCISAccFG=0.000708, FCISLogLoss=2.082165, FCISL1Loss=0.089456, FCISMaskLoss=0.632843,
Exception in thread Thread-71: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "experiments/fcis/../../fcis/../lib/utils/PrefetchingIter.py", line 60, in prefetch_func self.next_batch[i] = self.iters[i].next() File "experiments/fcis/../../fcis/core/loader.py", line 99, in next self.get_batch_parallel() File "experiments/fcis/../../fcis/core/loader.py", line 161, in get_batch_parallel rst = self.parfetch(roidb) File "experiments/fcis/../../fcis/core/loader.py", line 183, in parfetch gt_masks = get_gt_masks(roidb[0]['cache_seg_inst'], data['im_info'][0,:2].astype('int')) File "experiments/fcis/../../fcis/../lib/mask/mask_transform.py", line 25, in get_gt_masks gt_masks = hkl.load(gt_mask_file) File "/usr/local/lib/python2.7/dist-packages/hickle.py", line 616, in load h5f = file_opener(fileobj) File "/usr/local/lib/python2.7/dist-packages/hickle.py", line 154, in file_opener h5f = h5.File(filename, mode) File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 272, in init fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr) File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 92, in make_fid fid = h5f.open(name, flags, fapl=fapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642) File "h5py/h5f.pyx", line 76, in h5py.h5f.open (/tmp/pip-4rPeHA-build/h5py/h5f.c:1930) IOError: Unable to open file (File signature not found)

Anyone can help me with this?

PardoAlejo avatar Sep 14 '17 22:09 PardoAlejo

Maybe you can try printing the path before you read the file and see whether the one leads to this error have some problem. It seems that the program fails to read this file, which may because the file doesn't exist or you don't have the permission to get access to it. Add fixed random seed in TrainDataLoader may help you locate that file.

liyi14 avatar Sep 15 '17 17:09 liyi14

A solution is given in #11 but it did not work for me, all my images and hkl files (stored as cache) did have size >0. My solution was deleting the cache hkl files and launching the training again so they are created again hopefully without error.

mldm4 avatar Oct 26 '17 11:10 mldm4

"My solution was deleting the cache hkl files and launching the training again so they are created again hopefully without error",what did this mean?

wyx-2018 avatar Nov 01 '18 17:11 wyx-2018