torch-rgcn icon indicating copy to clipboard operation
torch-rgcn copied to clipboard

zsh: segmentation fault when running the classification task

Open traopia opened this issue 2 years ago • 3 comments

I get this error when running the cn task: zsh: segmentation fault python experiments/classify_nodes.py with configs/rgcn/nc-AIFB.yaml I tried with all the datasets and models, and with some debugging I found out that this happens right before the training starts. I am wondering if it's a problem of my machine or not and if so how to fix it. Thank you for your help!

traopia avatar Feb 12 '23 22:02 traopia

Hey @traopia, Thanks for reporting this issue.

Segmentation fault is a system issue and it occurs when the program has issues accessing memory. Can you confirm that you have enough RAM to run the program? Perhaps, this issue should resolve itself after rebooting your machine.

Is there a traceback from Python? What OS are you using?

thiviyanT avatar Feb 13 '23 13:02 thiviyanT

Hey! I tried on Lisa and indeed it worked well - so good news the code is great - bad news it’s a problem of my machine :/ I tried rebooting again and did all the updates, I rerun it from terminal but still I get a segmentation fault. I installed a module that shows me how much RAM memory I have available and I should have 10 GB available. I then installed a library (psutil https://psutil.readthedocs.io/en/latest/) to print how much memory the process is using and I printed it whilst running the code on Lisa and it is 2434609152 bytes thus around 2.26 GB. I don’t really get why this segmentation fault keeps on happening. If you are still keen to help me figuring that out, I’d really appreciate that! Thank you, Teresa

Il giorno 13 feb 2023, alle ore 14:32, Thiviyan Singam @.***> ha scritto:

Hey @traopia https://github.com/traopia, Thanks for reporting this issue.

Segmentation fault is a system issue and it occurs when the program has issues accessing memory. Can you confirm that you have enough RAM to run the program? Perhaps, this issue should resolve itself after rebooting your machine.

Is there a traceback from Python? What OS are you using?

— Reply to this email directly, view it on GitHub https://github.com/thiviyanT/torch-rgcn/issues/17#issuecomment-1427950265, or unsubscribe https://github.com/notifications/unsubscribe-auth/AV5CQOLF3DGA3YFA4ELU63TWXIZWRANCNFSM6AAAAAAUZTJQLU. You are receiving this because you were mentioned.

traopia avatar Feb 13 '23 18:02 traopia

It is great to hear that it works on a different machine! This means that the code is still working.

Regarding your machine, a segmentation fault is hard to debug because it could arise from a number of different things: low memory (we have ruled this out), a bug in the code, faulty ram memory, etc. What makes it trickier is that the error message is so uninformative.

I have found that faulthandler, a built-in python package, can be used to debug this problem: https://stackoverflow.com/a/58825725. You can use this to get more information about the issue. Then it is easier to find a solution.

thiviyanT avatar Feb 14 '23 09:02 thiviyanT