face-alignment-training
face-alignment-training copied to clipboard
I met a problem when I run this code.
Following the README document,I run:
th main.lua -data /media/john/Documents/Dataset/300W_LP/
The construction of dataset directory is: . ├── AFW ├── AFW_Flip ├── Code │ ├── Mex │ └── ModelGeneration ├── HELEN ├── HELEN_Flip ├── IBUG ├── IBUG_Flip ├── landmarks │ ├── AFW │ ├── HELEN │ ├── IBUG │ └── LFPW ├── LFPW └── LFPW_Flip
But I meet a problem :
=> Creating model from file: models/fan.lua
=> Model size: 23820176
=> Building dataset...
=> Dataset built. 115120 images were found.
=> DataLoader.create
Using lr_rate: 0.000250
=> Training epoch # 1
/home/john/torch/install/bin/luajit: /home/john/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 2 callback] cannot open </media/john/Documents/Dataset/300W_LP/landmarks/HELEN/HELEN_2642751678_1_11.t7> in mode r at /home/john/torch/pkg/torch/lib/TH/THDiskFile.c:673
stack traceback:
[C]: at 0x7f89f5719430
[C]: in function 'DiskFile'
/home/john/torch/install/share/lua/5.1/torch/File.lua:405: in function 'load'
./dataset-images.lua:16: in function 'generateSampleFace'
./dataset-images.lua:34: in function 'get'
./dataloader.lua:96: in function <./dataloader.lua:90>
[C]: in function 'xpcall'
/home/john/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
/home/john/torch/install/share/lua/5.1/threads/queue.lua:65: in function </home/john/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
/home/john/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:15: in main chunk
stack traceback:
[C]: in function 'error'
/home/john/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
./dataloader.lua:132: in function '(for generator)'
./train.lua:49: in function 'train'
main.lua:45: in main chunk
[C]: in function 'dofile'
...john/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
How can I solve this problem?
Hi, there are some minor bugs should be changed to run the training code, otherwise, it will meet some problems like yours. I have met your problems. It because of some wrong string operations. Here are some tips for these errors:
- 16 line of dataloader.lua : for f in paths.files(base_dir..dirs[i],'.mat') do should be for f in paths.files(base_dir..dirs[i],'.t7') do
- 17 line of dataset-images.lua: local main_pts = torch.load(self.opt.data..'landmarks/'..self.annot[idx]:split('')[1]..'/'..string.sub(self.annot[idx],1,#self.annot[idx]-4)..'.t7') should be local main_pts = torch.load(self.opt.data..'landmarks/'..self.annot[idx]:split('')[1]..'/'..string.sub(self.annot[idx],1,#self.annot[idx]-4)..'s.t7')
- in the end, if you are using a small capacity GPU, you may meet this error:
'/share/lua/5.1/cudnn/find.lua:483: cudnnFindConvolutionBackwardDataAlgorithm failed, sizes: convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA10,128,64,64 -filtA64,128,3,3 10,64,64,64 -padA1,1 -convStrideA1,1 CUDNN_DATA_FLOAT' then you can change the number of batchsize in 23 line of opt.lua:
cmd:option('-batchSize', 10, 'mini-batch size (1 = pure stochastic)') as you want.