contact_graspnet icon indicating copy to clipboard operation
contact_graspnet copied to clipboard

Training : Killed

Open schnellerblitz99 opened this issue 3 years ago • 1 comments

Hey,

python contact_graspnet/train.py --ckpt_dir checkpoints/custom1 --data_path acronym/

I am trying to get the training pipeline running, but after showing "**** EPOCH 000 ****" my training is Killed. I could pin down the error to this line sess.run(ops['iterator'].initializer) in train.py line 109. Do you have an idea what might be wrong? I am using a conda env, when I run sh compile_pointnet_tfops.sh I get:

[ RUN ] GroupPointTest.test [ OK ] GroupPointTest.test [ RUN ] GroupPointTest.test_grad

1.6927719116210938e-05 [ OK ] GroupPointTest.test_grad [ RUN ] GroupPointTest.test_session [ SKIPPED ] GroupPointTest.test_session

3.540515899658203e-05 [ OK ] GroupPointTest.test_grad [ RUN ] GroupPointTest.test_session [ SKIPPED ] GroupPointTest.test_session

Might this be the reason why it is not working or is this fine?

My training data folder looks like this (no meshes folder):

acronym --grasps --scene_contacts --splits

Thank you very much!

schnellerblitz99 avatar Oct 27 '21 13:10 schnellerblitz99

First, you definitely need the meshes for training, please follow the instructions in the README.

Concerning your error, how much RAM do you have on your machine? Could you monitor it while running train.py? All grasp annotations are loaded into RAM at the start of training for efficiency reasons, but if you have less than 100GB the program might be killed.

MartinSmeyer avatar Oct 28 '21 09:10 MartinSmeyer