JULE.torch icon indicating copy to clipboard operation
JULE.torch copied to clipboard

There is a problem when run my data

Open zuoeye opened this issue 7 years ago • 7 comments

I tried to run this code in my data. But I am having trouble when run my data: /home/ai/torch/install/bin/luajit: train.lua:382: cuda runtime error (59) : device-side assert triggered at /tmp/luarocks_cutorch-scm-1-3009/cutorch/lib/THC/generic/THCTensorCopy.c:18 stack traceback: [C]: in function 'indexCopy' train.lua:382: in function 'organize_samples' train.lua:422: in function 'opfunc' /home/ai/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd' train.lua:436: in function 'updateCNN' train.lua:487: in main chunk [C]: in function 'dofile' ...e/ai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00406670 How to solve this problem? I look forward to your response at your earliest convenience. Thanks.

zuoeye avatar Nov 29 '17 08:11 zuoeye

My Data is 28283. So I Just copy the model_def of FRGC to build my network architecture. Is this the reason of the trouble?Then,how to define my own network architecture to train the model on images with size 28*28?

zuoeye avatar Nov 29 '17 09:11 zuoeye

Yes, I think that might be the reason. FRGC is 32x32, so the output feature dimesion of the network might be wrong for you 28x28 images. You can try the architecture for MNIST, since it is also 28x28.

jwyang avatar Nov 29 '17 16:11 jwyang

Thanks for your answer, but there is still probelm when I tried architecture for MNIST:

`online epoch # 0 [batchSize = 100] [learningRate = 0.01] /home/ai/torch/install/bin/luajit: /home/ai/torch/install/share/lua/5.1/nn/Container.lua:67: In 2 module of nn.Sequential: In 1 module of nn.Sequential: In 1 module of nn.Sequential: /home/ai/torch/install/share/lua/5.1/nn/THNN.lua:110: Need input of dimension 4 and input.size[1] == 1 but got input to be of shape: [100 x 3 x 28 x 28] at /tmp/luarocks_cunn-scm-1-3260/cunn/lib/THCUNN/generic/SpatialConvolutionMM.cu:49 stack traceback: [C]: in function 'v' /home/ai/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'SpatialConvolutionMM_updateOutput' ...ai/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:79: in function <...ai/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:76> [C]: in function 'xpcall' /home/ai/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:41> [C]: in function 'xpcall' /home/ai/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:41> [C]: in function 'xpcall' /home/ai/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' train.lua:421: in function 'opfunc' /home/ai/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd' train.lua:436: in function 'updateCNN' train.lua:487: in main chunk [C]: in function 'dofile' ...e/ai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00406670

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above. stack traceback: [C]: in function 'error' /home/ai/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors' /home/ai/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' train.lua:421: in function 'opfunc' /home/ai/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd' train.lua:436: in function 'updateCNN' train.lua:487: in main chunk [C]: in function 'dofile' ...e/ai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00406670`

I think that It may be my data is three-channel RGB data. I found FRGC is the three-channel RGB data. Then I resize my data to 33232. But there is still trouble when I tried architecture for FRGC:

==> online epoch # 0 [batchSize = 100] [learningRate = 0.01] loss: 0.037374177345863 /home/ai/torch/install/bin/luajit: bad argument #2 to '?' (out of range at /home/ai/torch/pkg/torch/generic/Tensor.c:913) stack traceback: [C]: at 0x7f453c6d9b30 [C]: in function '__index' train.lua:366: in function 'organize_samples' train.lua:422: in function 'opfunc' /home/ai/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd' train.lua:436: in function 'updateCNN' train.lua:487: in main chunk [C]: in function 'dofile' ...e/ai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00406670

Do you know how to do? Or can you teach me how to build my network architecture. Especially, how to choose the parameters in the model_def? For example, how to choose the parameters of nInputPlanes, nOutputPlanes, nn.View, nn.Linear, nn.Normalize in the model_def?

zuoeye avatar Nov 30 '17 11:11 zuoeye

Hello, I'm waiting for your answer. Kindly favour me with an early reply. Thank you.

zuoeye avatar Dec 04 '17 01:12 zuoeye

Have you solved the problem? I think you need to convert you data to 3 channels. SInce the architecture for FRGC merely takes 3 channels as input.

Also, please remember to give the groundtruth labels. If you do not have, then randomly initialize the labels in advance.

jwyang avatar Dec 25 '17 20:12 jwyang

Hi I have the same issue, ran it on the FRGC with 3 channels, but got

/home/lifelogging/torch/install/bin/luajit: bad argument #2 to '?' (out of range at /home/lifelogging/torch/pkg/torch/generic/Tensor.c:913) stack traceback: [C]: at 0x7f1af1a2bb60 [C]: in function '__index' train.lua:368: in function 'organize_samples' train.lua:424: in function 'opfunc' /home/lifelogging/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd' train.lua:438: in function 'updateCNN' train.lua:489: in main chunk [C]: in function 'dofile' ...ging/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

were you able to solve this?

Thank you

dcharua avatar Mar 17 '18 22:03 dcharua

The problem is with the data, it needs to be in the correct format a 32float so the header of the h5 should look like this

HDF5 "data4torch.h5" { GROUP "/" { DATASET "data" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 35898, 3, 32, 32 ) / ( 35898, 3, 32, 32 ) } } DATASET "labels" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 35898 ) / ( 35898 ) } } } }

dcharua avatar Mar 17 '18 23:03 dcharua