imagenet-multiGPU.torch
imagenet-multiGPU.torch copied to clipboard
Training stops while running main.lua , "nClasses is reported different in the data loader, and in the commandline options"
Hi,
I have created a dataset/train folder with folder name as that of fruits with images inside and dataset/val folder with folder name as that of fruits with validation images. My jetson TX1 showed the following output after running for a longtime
ubuntu@tegra-ubuntu:~/imagenet-multiGPU.torch$ th main.lua -data /home/ubuntu/imagenet-multiGPU.torch/dataset/
-- ignore option data
-- ignore option optimState
-- ignore option cache
-- ignore option netType
-- ignore option retrain
=> Creating model from file: models/alexnetowtbn.lua
=> Model
nn.Sequential {
input -> (1) -> (2) -> output: nn.Sequential {
input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> output: cudnn.SpatialConvolution(3 -> 64, 11x11, 4,4, 2,2)
(2): cudnn.SpatialBatchNormalization
(3): cudnn.ReLU
(4): cudnn.SpatialMaxPooling(3x3, 2,2)
(5): cudnn.SpatialConvolution(64 -> 192, 5x5, 1,1, 2,2)
(6): cudnn.SpatialBatchNormalization
(7): cudnn.ReLU
(8): cudnn.SpatialMaxPooling(3x3, 2,2)
(9): cudnn.SpatialConvolution(192 -> 384, 3x3, 1,1, 1,1)
(10): cudnn.SpatialBatchNormalization
(11): cudnn.ReLU
(12): cudnn.SpatialConvolution(384 -> 256, 3x3, 1,1, 1,1)
(13): cudnn.SpatialBatchNormalization
(14): cudnn.ReLU
(15): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
(16): cudnn.SpatialBatchNormalization
(17): cudnn.ReLU
(18): cudnn.SpatialMaxPooling(3x3, 2,2)
}
(2): nn.Sequential {
input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> output: nn.View(9216)
(2): nn.Dropout(0.500000)
(3): nn.Linear(9216 -> 4096)
(4): cudnn.BatchNormalization
(5): cudnn.ReLU
(6): nn.Dropout(0.500000)
(7): nn.Linear(4096 -> 4096)
(8): cudnn.BatchNormalization
(9): cudnn.ReLU
(10): nn.Linear(4096 -> 1000)
(11): cudnn.LogSoftMax
}
}
=> Criterion
nn.ClassNLLCriterion
==> Converting model to CUDA
{
LR : 0
nClasses : 1000
batchSize : 128
data : "/home/ubuntu/imagenet-multiGPU.torch/dataset/"
epochSize : 10000
nDonkeys : 2
save : "/home/ubuntu/imagenet-multiGPU.torch/imagenet/checkpoint/alexnetowtbn/MonAug2212:32:032016"
optimState : "none"
cropSize : 224
nGPU : 1
imageCrop : 224
imageSize : 256
epochNumber : 1
momentum : 0.9
cache : "./imagenet/checkpoint/"
backend : "cudnn"
nEpochs : 55
manualSeed : 2
GPU : 1
weightDecay : 0.0005
netType : "alexnetowtbn"
retrain : "none"
}
Saving everything to: /home/ubuntu/imagenet-multiGPU.torch/imagenet/checkpoint/alexnetowtbn/MonAug2212:32:032016
Starting donkey with id: 2 seed: 4
Starting donkey with id: 1 seed: 3
Creating train metadata
table: 0xd562f928
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
Creating train metadata
table: 0xd0caaad0
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
now combine all the files to a single large file
now combine all the files to a single large file
load the large concatenated list of sample paths to self.imagePath
load the large concatenated list of sample paths to self.imagePath
34 samples found.......................... 0/34 ........................................] ETA: 0ms | Step: 0ms
Updating classList and imageClass appropriately
34 samples found.......................... 0/34 ........................................] ETA: 0ms | Step: 0ms
Updating classList and imageClass appropriately
[======================================== 2/2 ========================================>] Tot: 56ms | Step: 28ms
[======================================== 2/2 ========================================>] Tot: 58ms | Step: 29ms
Cleaning up temporary files
Cleaning up temporary files
Creating test metadata
table: 0xd51101a8
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
Creating test metadata
table: 0xd516bd48
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
now combine all the files to a single large file
now combine all the files to a single large file
load the large concatenated list of sample paths to self.imagePath
load the large concatenated list of sample paths to self.imagePath
9 samples found........................... 0/9 .........................................] ETA: 0ms | Step: 0ms
Updating classList and imageClass appropriately
9 samples found........................... 0/9 .........................................] ETA: 0ms | Step: 0ms
Updating classList and imageClass appropriately
[======================================== 2/2 ========================================>] Tot: 53ms | Step: 26ms
[======================================== 2/2 ========================================>] Tot: 55ms | Step: 27ms
Cleaning up temporary files
Cleaning up temporary files
Splitting training and test sets to a ratio of 0/100
Estimating the mean (per-channel, shared for all pixels) over 10000 randomly sampled training images
Splitting training and test sets to a ratio of 0/100
Estimating the mean (per-channel, shared for all pixels) over 10000 randomly sampled training images
Estimating the std (per-channel, shared for all pixels) over 10000 randomly sampled training images
Estimating the std (per-channel, shared for all pixels) over 10000 randomly sampled training images
Time to estimate: 1260.1163020134
Time to estimate: 1266.3318710327
/usr/local/bin/luajit: /home/ubuntu/imagenet-multiGPU.torch/data.lua:47: nClasses is reported different in the data loader, and in the commandline options
stack traceback:
[C]: in function 'assert'
/home/ubuntu/imagenet-multiGPU.torch/data.lua:47: in main chunk
[C]: in function 'dofile'
main.lua:37: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x0000d055
ubuntu@tegra-ubuntu:~/imagenet-multiGPU.torch$
number of classes must be same with opts.lua