DeepPavlov icon indicating copy to clipboard operation
DeepPavlov copied to clipboard

It is not possible to train a question-answer model on own dataset

Open ilya68rus01 opened this issue 4 years ago • 1 comments

I have before me an interesting task of training a question-answer system on my own data. I use this notebook as an example: https://colab.research.google.com/github/deepmipt/dp_notebooks/blob/master/DP_ODQA.ipynb#scrollTo=7C1mT_2-Nlnj

However, the ranker does not enter information about existing training files into the database or writes that it is busy with another process.

The code is presented below.

import os from deeppavlov.core.common.file import read_json from deeppavlov import configs, train_model

model_config = read_json(configs.doc_retrieval.en_ranker_tfidf_wiki) print(os.path.abspath(os.getcwd()) + "\Resourses") model_config["dataset_reader"]["data_path"] = os.path.abspath(os.getcwd()) + "\Resourses" model_config["dataset_reader"]["dataset_format"] = "txt" model_config["train"]["batch_size"] = 1000 doc_retrieval = train_model(model_config) res = doc_retrieval(['cerebellum']) print(res)

DeepPavlov version (you can look it up by running pip show deeppavlov): Deeppavlov verion 0.10.0

Python version: Python 3.7.4

Operating system (ubuntu linux, windows, ...): Windows 10 x64, but in the future it is planned to deploy in docker

Issue: How can I solve the problem with the database and train the model?

Content or a name of a configuration file: en_ranker_tfidf_wiki

Command that led to error: doc_retrieval = train_model(model_config)

Error (including full traceback): 2020-06-10 13:05:51.601 INFO in 'deeppavlov.dataset_readers.odqa_reader'['odqa_reader'] at line 57: Reading files... 2020-06-10 13:05:51.602 INFO in 'deeppavlov.dataset_readers.odqa_reader'['odqa_reader'] at line 134: Building the database... 0%| | 0/24 [00:00<?, ?it/s] 0it [00:00, ?it/s]C:\Users\Ilya\PycharmProjects\DialogSystem\DialogService\Resourses 2020-06-10 13:05:53.22 INFO in 'deeppavlov.dataset_readers.odqa_reader'['odqa_reader'] at line 57: Reading files... Traceback (most recent call last): File "", line 1, in File "C:\Users\Ilya\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "C:\Users\Ilya\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 114, in _main prepare(preparation_data) File "C:\Users\Ilya\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 225, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\Users\Ilya\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path run_name="mp_main") File "C:\Users\Ilya\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "C:\Users\Ilya\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "C:\Users\Ilya\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in run_code exec(code, run_globals) File "C:\Users\Ilya\PycharmProjects\DialogSystem\DialogService\DeepPavlovWrapper.py", line 11, in doc_retrieval = train_model(model_config) File "C:\Users\Ilya\PycharmProjects\DialogSystem\venv\lib\site-packages\deeppavlov_init.py", line 29, in train_model train_evaluate_model_from_config(config, download=download, recursive=recursive) File "C:\Users\Ilya\PycharmProjects\DialogSystem\venv\lib\site-packages\deeppavlov\core\commands\train.py", line 92, in train_evaluate_model_from_config data = read_data_by_config(config) File "C:\Users\Ilya\PycharmProjects\DialogSystem\venv\lib\site-packages\deeppavlov\core\commands\train.py", line 58, in read_data_by_config return reader.read(data_path, **reader_config) File "C:\Users\Ilya\PycharmProjects\DialogSystem\venv\lib\site-packages\deeppavlov\dataset_readers\odqa_reader.py", line 81, in read self._build_db(save_path, dataset_format, expand_path(data_path)) File "C:\Users\Ilya\PycharmProjects\DialogSystem\venv\lib\site-packages\deeppavlov\dataset_readers\odqa_reader.py", line 130, in _build_db Path(save_path).unlink() File "C:\Users\Ilya\AppData\Local\Programs\Python\Python37\lib\pathlib.py", line 1294, in unlink self._accessor.unlink(self) PermissionError: [WinError 32] Процесс не может получить доступ к файлу, так как этот файл занят другим процессом: 'C:\Users\Ilya\.deeppavlov\downloads\odqa\enwiki.db'

ilya68rus01 avatar Jun 10 '20 10:06 ilya68rus01

Now the path to database is 'C:\Users\Ilya.deeppavlov\downloads\odqa\enwiki.db' The error message shows that this file is used with another process. I suggest to change filename in the config: model_config["dataset_reader"]["save_path"] = new_path

dmitrijeuseew avatar Jul 03 '20 13:07 dmitrijeuseew