ACE
ACE copied to clipboard
Segmentation Fault (core dumped)
Hi Team,
As suggested by you here to evaluate the model on the CoNLL2003 dataset, I was running the command CUDA_VISIBLE_DEVICES=0 python train.py --config config/conll_03_english.yaml --test
to test the working of the code. However, when doing so I get below error:
I had tried debugging it as well but couldn't get a way around this.
My system configurations are:
Ubuntu: 20.04
RAM: 32GB
GPU: NVIDIA GeForce RTX 3080 Ti
I have not met such kinds of problem before. It seems that the problem comes when loading the embeddings. Maybe the CPU memory is not enough.
After enough digging into the issue, I can see that the issue is because PyTorch is not able to access the CUDA.
Also, the recommended PyTorch version (1.3.1) is not listed on the PyTorch website of official releases but is somehow present in the PyPi.
This is the snippet where torch fails to put a variable on CUDA:
Moreover, the transformers "from_pretrained" is not able to load the pre-trained models. Thus, throwing "Segmentation fault" issue.
Apart from this, the flair code also threw this error in the "embeddings.py" in the constructor of TransformerWordEmbeddings when calling the parent class transformer. The code was throwing the same error. Attached is the screenshot of the place of code where the issue happened.
Could you tell me which CUDA & Nvidia-Drivers version did you run it with? I was just trying to set up the repository and evaluate to see if the set up was successful.
I have the same problem. My cuda is 11.3, so I update torch 1.11.0, the problem is solved.