ALAE icon indicating copy to clipboard operation
ALAE copied to clipboard

Can run interactive demo but can't train: Segmentation fault (core dumped)

Open jbmaxwell opened this issue 4 years ago • 6 comments

I've been trying to train on MNIST (I have custom data that's MNIST-like) but keep hitting Segmentation fault (core dumped).

tensorflow-gpu = 1.15 pytorch = 1.4.0

I have doreblopy installed.

Out of curiosity I ran the interactive demo, which works fine.

jbmaxwell avatar Oct 28 '20 15:10 jbmaxwell

Im also having trouble when training on MNIST. If you are able to then please tell me how to overcome the issues

hri98mahesh avatar Nov 05 '20 02:11 hri98mahesh

Unfortunately I gave up (at least for now).

jbmaxwell avatar Nov 05 '20 03:11 jbmaxwell

@jbmaxwell, can you please describe what steps you did? Did you generate tf records for your dataset? Did you adjust the yaml config accordinly? It could be just that the paths are wrong. While I tried to make Dareblopy verbose, it still may crash (segmentation fault) if something is wrong. You can replace Dataset implementation with your own in dataloader.py, without dareblopy. It was done maingly for performance reasons.

podgorskiy avatar Nov 27 '20 13:11 podgorskiy

I think it's caused by dareblopy and you can't debug it, unfortunately. As the author said above, usually it's a problem of data preparation. Any time you adjust the config file, no matter related to the dataset or not, e.g. changing the batch size, try generating the tfrecords file again. hope it helps:)

uhiu avatar Dec 05 '20 09:12 uhiu

@uhiu ,

Any time you adjust the config file, no matter related to the dataset or not, e.g. changing the batch size, try generating the tfrecords file again. hope it helps:)

Well, it should not depend on the batch size, that's for sure.

I improved dareblopy considerably in v0.0.5. I tried many scenarios of a wrong usage, all of them result in a python exception with a detailed description of the problem, not segfault like before.

Even in case of a crash, it should print some minimal crash log with call stack to help to investigate the problem.

@jbmaxwell , Could you please do pip install dareblopy --upgrade and try again? Please make sure that you have v0.0.5. What platform do you use?

podgorskiy avatar Dec 07 '20 08:12 podgorskiy

Exactly same issue here. image

ennauata avatar Jul 07 '22 18:07 ennauata