face-alignment-training icon indicating copy to clipboard operation
face-alignment-training copied to clipboard

I met a problem when I run this code.

Open ghost opened this issue 5 years ago • 1 comments

Following the README document,I run:

th main.lua -data /media/john/Documents/Dataset/300W_LP/

The construction of dataset directory is: . ├── AFW ├── AFW_Flip ├── Code │   ├── Mex │   └── ModelGeneration ├── HELEN ├── HELEN_Flip ├── IBUG ├── IBUG_Flip ├── landmarks │   ├── AFW │   ├── HELEN │   ├── IBUG │   └── LFPW ├── LFPW └── LFPW_Flip

But I meet a problem :

=> Creating model from file: models/fan.lua	
=> Model size: 	23820176	
=> Building dataset...	
=> Dataset built. 115120 images were found.	
=> DataLoader.create 	
Using lr_rate: 0.000250	
=> Training epoch # 1	
/home/john/torch/install/bin/luajit: /home/john/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 2 callback] cannot open </media/john/Documents/Dataset/300W_LP/landmarks/HELEN/HELEN_2642751678_1_11.t7> in mode r  at /home/john/torch/pkg/torch/lib/TH/THDiskFile.c:673
stack traceback:
	[C]: at 0x7f89f5719430
	[C]: in function 'DiskFile'
	/home/john/torch/install/share/lua/5.1/torch/File.lua:405: in function 'load'
	./dataset-images.lua:16: in function 'generateSampleFace'
	./dataset-images.lua:34: in function 'get'
	./dataloader.lua:96: in function <./dataloader.lua:90>
	[C]: in function 'xpcall'
	/home/john/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
	/home/john/torch/install/share/lua/5.1/threads/queue.lua:65: in function </home/john/torch/install/share/lua/5.1/threads/queue.lua:41>
	[C]: in function 'pcall'
	/home/john/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
	[string "  local Queue = require 'threads.queue'..."]:15: in main chunk
stack traceback:
	[C]: in function 'error'
	/home/john/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
	./dataloader.lua:132: in function '(for generator)'
	./train.lua:49: in function 'train'
	main.lua:45: in main chunk
	[C]: in function 'dofile'
	...john/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x00405d50

How can I solve this problem?

ghost avatar May 02 '19 08:05 ghost

Hi, there are some minor bugs should be changed to run the training code, otherwise, it will meet some problems like yours. I have met your problems. It because of some wrong string operations. Here are some tips for these errors:

  1. 16 line of dataloader.lua : for f in paths.files(base_dir..dirs[i],'.mat') do should be for f in paths.files(base_dir..dirs[i],'.t7') do
  2. 17 line of dataset-images.lua: local main_pts = torch.load(self.opt.data..'landmarks/'..self.annot[idx]:split('')[1]..'/'..string.sub(self.annot[idx],1,#self.annot[idx]-4)..'.t7')   should be local main_pts = torch.load(self.opt.data..'landmarks/'..self.annot[idx]:split('')[1]..'/'..string.sub(self.annot[idx],1,#self.annot[idx]-4)..'s.t7')
  3. in the end, if you are using a small capacity GPU, you may meet this error:
    '/share/lua/5.1/cudnn/find.lua:483: cudnnFindConvolutionBackwardDataAlgorithm failed, sizes: convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA10,128,64,64 -filtA64,128,3,3 10,64,64,64 -padA1,1 -convStrideA1,1 CUDNN_DATA_FLOAT' then you can change the number of batchsize in 23 line of opt.lua:
    cmd:option('-batchSize', 10, 'mini-batch size (1 = pure stochastic)') as you want.

WaterCube001 avatar Aug 26 '19 03:08 WaterCube001