cdQA
cdQA copied to clipboard
How to use cdQA for non-English language?
I tried to use the library to train set in Russian and it did not work out much.
What I've done for it:
- Of course, I prepared a data set of 5 articles in Russian as a source for training.
- I used Pre-trained BERT-Base, Multilingual Cased and use this Issue as an example to prepare it for Joblib format
Here is my code to do this:
import torch from cdqa.reader import BertQA from transformers import BertForQuestionAnswering, DistilBertForQuestionAnswering import joblib import os reader = BertQA()
reader.model = BertForQuestionAnswering.from_pretrained("bert-base-multilingual-uncased")
reader.model.to('cpu') reader.device = torch.device('cpu')
joblib.dump(reader, os.path.join("models", 'ml_qa_bert.joblib'))
After I trained the model and tried to make query, I have this error: AttributeError: Can only use .str accessor with string values!
What am I doing wrong and what should I do to make cdQA work with Russian?
I don't think this issue is related to the title of your question. That might be why you're not getting help.
Me too i want to use Spanish for this, However your error is related to the formatting of data, check the demo, for more clarity on how the data is ('paragraphs' and 'title'...)