imagenet-multiGPU.torch Training stops while running main.lua , "nClasses is reported different in the data loader, and in the commandline options"

Training stops while running main.lua , "nClasses is reported different in the data loader, and in the commandline options"

Open DKP-90 opened this issue 8 years ago • 1 comments

Hi,

I have created a dataset/train folder with folder name as that of fruits with images inside and dataset/val folder with folder name as that of fruits with validation images. My jetson TX1 showed the following output after running for a longtime

ubuntu@tegra-ubuntu:~/imagenet-multiGPU.torch$ th main.lua -data /home/ubuntu/imagenet-multiGPU.torch/dataset/ -- ignore option data
-- ignore option optimState -- ignore option cache
-- ignore option netType
-- ignore option retrain
=> Creating model from file: models/alexnetowtbn.lua
=> Model
nn.Sequential { input -> (1) -> (2) -> output: nn.Sequential { input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> output: cudnn.SpatialConvolution(3 -> 64, 11x11, 4,4, 2,2) (2): cudnn.SpatialBatchNormalization (3): cudnn.ReLU (4): cudnn.SpatialMaxPooling(3x3, 2,2) (5): cudnn.SpatialConvolution(64 -> 192, 5x5, 1,1, 2,2) (6): cudnn.SpatialBatchNormalization (7): cudnn.ReLU (8): cudnn.SpatialMaxPooling(3x3, 2,2) (9): cudnn.SpatialConvolution(192 -> 384, 3x3, 1,1, 1,1) (10): cudnn.SpatialBatchNormalization (11): cudnn.ReLU (12): cudnn.SpatialConvolution(384 -> 256, 3x3, 1,1, 1,1) (13): cudnn.SpatialBatchNormalization (14): cudnn.ReLU (15): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (16): cudnn.SpatialBatchNormalization (17): cudnn.ReLU (18): cudnn.SpatialMaxPooling(3x3, 2,2) } (2): nn.Sequential { input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> output: nn.View(9216) (2): nn.Dropout(0.500000) (3): nn.Linear(9216 -> 4096) (4): cudnn.BatchNormalization (5): cudnn.ReLU (6): nn.Dropout(0.500000) (7): nn.Linear(4096 -> 4096) (8): cudnn.BatchNormalization (9): cudnn.ReLU (10): nn.Linear(4096 -> 1000) (11): cudnn.LogSoftMax } } => Criterion
nn.ClassNLLCriterion ==> Converting model to CUDA
{ LR : 0 nClasses : 1000 batchSize : 128 data : "/home/ubuntu/imagenet-multiGPU.torch/dataset/" epochSize : 10000 nDonkeys : 2 save : "/home/ubuntu/imagenet-multiGPU.torch/imagenet/checkpoint/alexnetowtbn/MonAug2212:32:032016" optimState : "none" cropSize : 224 nGPU : 1 imageCrop : 224 imageSize : 256 epochNumber : 1 momentum : 0.9 cache : "./imagenet/checkpoint/" backend : "cudnn" nEpochs : 55 manualSeed : 2 GPU : 1 weightDecay : 0.0005 netType : "alexnetowtbn" retrain : "none" } Saving everything to: /home/ubuntu/imagenet-multiGPU.torch/imagenet/checkpoint/alexnetowtbn/MonAug2212:32:032016
Starting donkey with id: 2 seed: 4 Starting donkey with id: 1 seed: 3 Creating train metadata table: 0xd562f928 running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class Creating train metadata table: 0xd0caaad0 running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class now combine all the files to a single large file now combine all the files to a single large file load the large concatenated list of sample paths to self.imagePath load the large concatenated list of sample paths to self.imagePath 34 samples found.......................... 0/34 ........................................] ETA: 0ms | Step: 0ms
Updating classList and imageClass appropriately 34 samples found.......................... 0/34 ........................................] ETA: 0ms | Step: 0ms
Updating classList and imageClass appropriately [======================================== 2/2 ========================================>] Tot: 56ms | Step: 28ms
[======================================== 2/2 ========================================>] Tot: 58ms | Step: 29ms
Cleaning up temporary files Cleaning up temporary files Creating test metadata table: 0xd51101a8 running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class Creating test metadata table: 0xd516bd48 running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class now combine all the files to a single large file now combine all the files to a single large file load the large concatenated list of sample paths to self.imagePath load the large concatenated list of sample paths to self.imagePath 9 samples found........................... 0/9 .........................................] ETA: 0ms | Step: 0ms
Updating classList and imageClass appropriately 9 samples found........................... 0/9 .........................................] ETA: 0ms | Step: 0ms
Updating classList and imageClass appropriately [======================================== 2/2 ========================================>] Tot: 53ms | Step: 26ms
[======================================== 2/2 ========================================>] Tot: 55ms | Step: 27ms
Cleaning up temporary files Cleaning up temporary files Splitting training and test sets to a ratio of 0/100 Estimating the mean (per-channel, shared for all pixels) over 10000 randomly sampled training images Splitting training and test sets to a ratio of 0/100 Estimating the mean (per-channel, shared for all pixels) over 10000 randomly sampled training images Estimating the std (per-channel, shared for all pixels) over 10000 randomly sampled training images Estimating the std (per-channel, shared for all pixels) over 10000 randomly sampled training images Time to estimate: 1260.1163020134 Time to estimate: 1266.3318710327 /usr/local/bin/luajit: /home/ubuntu/imagenet-multiGPU.torch/data.lua:47: nClasses is reported different in the data loader, and in the commandline options stack traceback: [C]: in function 'assert' /home/ubuntu/imagenet-multiGPU.torch/data.lua:47: in main chunk [C]: in function 'dofile' main.lua:37: in main chunk [C]: in function 'dofile' /usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x0000d055 ubuntu@tegra-ubuntu:~/imagenet-multiGPU.torch$

Aug 22 '16 07:08 DKP-90

number of classes must be same with opts.lua

Oct 26 '18 08:10 mhmtsarigul

imagenet-multiGPU.torch imagenet-multiGPU.torch copied to clipboard

Training stops while running main.lua , "nClasses is reported different in the data loader, and in the commandline options"

imagenet-multiGPU.torch
imagenet-multiGPU.torch copied to clipboard