cdQA
cdQA copied to clipboard
NotFittedError: BM25Vectorizer - Vocabulary wasn't fitted.
I 've got an issue when predicting adding sample code and error below
cdqa_pipeline = QAPipeline(reader='bert_models/bert_qa.joblib') cdqa_pipeline.fit_reader('bert_models/SQuAD_1.1/train-v1.1.json') cdqa_pipeline = QAPipeline(reader='bert_out.joblib') cdqa_pipeline.predict(query="Who is chaplin?")
NotFittedError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_is_fitted(estimator, attributes, msg, all_or_any) 912 913 if not all_or_any([hasattr(estimator, attr) for attr in attributes]): --> 914 raise NotFittedError(msg % {'name': type(estimator).name}) 915 916
NotFittedError: BM25Vectorizer - Vocabulary wasn't fitted.
I experience the same problem when I load my new dataset. Is this fix?
You have to fit the pipeline retriever to the dataframe with the documents before calling the predict()
function:
cdqa_pipeline.fit_retriever(df=df)
I had a similar problem because I thought cdqa_pipeline.dump_reader()
would dump the trained retriever too.
File A
from cdqa.pipeline import QAPipeline
cdqa_pipeline = QAPipeline(reader='./models/distilbert_qa.joblib')
cdqa_pipeline.fit_retriever(df=df)
cdqa_pipeline.dump_reader('./models/distilbert_qa_fine_tuned.joblib')
File B
from cdqa.pipeline import QAPipeline
cdqa_pipeline = QAPipeline(reader='./models/distilbert_qa_fine_tuned.joblib')
cdqa_pipeline.fit_retriever(df=df)
# fix:
# cdqa_pipeline = QAPipeline(reader='./models/distilbert_qa.joblib')
# cdqa_pipeline = cdqa_pipeline.fit_retriever(df=df)
while True:
q = input('Question: ')
answer = cdqa_pipeline.predict(query=q)
print(answer)