DeepPavlov
DeepPavlov copied to clipboard
It is not possible to train a question-answer model on own dataset
I have before me an interesting task of training a question-answer system on my own data. I use this notebook as an example: https://colab.research.google.com/github/deepmipt/dp_notebooks/blob/master/DP_ODQA.ipynb#scrollTo=7C1mT_2-Nlnj
However, the ranker does not enter information about existing training files into the database or writes that it is busy with another process.
The code is presented below.
import os from deeppavlov.core.common.file import read_json from deeppavlov import configs, train_model
model_config = read_json(configs.doc_retrieval.en_ranker_tfidf_wiki) print(os.path.abspath(os.getcwd()) + "\Resourses") model_config["dataset_reader"]["data_path"] = os.path.abspath(os.getcwd()) + "\Resourses" model_config["dataset_reader"]["dataset_format"] = "txt" model_config["train"]["batch_size"] = 1000 doc_retrieval = train_model(model_config) res = doc_retrieval(['cerebellum']) print(res)
DeepPavlov version (you can look it up by running pip show deeppavlov
):
Deeppavlov verion 0.10.0
Python version: Python 3.7.4
Operating system (ubuntu linux, windows, ...): Windows 10 x64, but in the future it is planned to deploy in docker
Issue: How can I solve the problem with the database and train the model?
Content or a name of a configuration file: en_ranker_tfidf_wiki
Command that led to error: doc_retrieval = train_model(model_config)
Error (including full traceback):
2020-06-10 13:05:51.601 INFO in 'deeppavlov.dataset_readers.odqa_reader'['odqa_reader'] at line 57: Reading files...
2020-06-10 13:05:51.602 INFO in 'deeppavlov.dataset_readers.odqa_reader'['odqa_reader'] at line 134: Building the database...
0%| | 0/24 [00:00<?, ?it/s]
0it [00:00, ?it/s]C:\Users\Ilya\PycharmProjects\DialogSystem\DialogService\Resourses
2020-06-10 13:05:53.22 INFO in 'deeppavlov.dataset_readers.odqa_reader'['odqa_reader'] at line 57: Reading files...
Traceback (most recent call last):
File "
Now the path to database is 'C:\Users\Ilya.deeppavlov\downloads\odqa\enwiki.db' The error message shows that this file is used with another process. I suggest to change filename in the config: model_config["dataset_reader"]["save_path"] = new_path