Problem with get_wiki (I think because of possible changes to wiki_extractor)
I am trying to rerun the https://github.com/fastai/course-nlp/blob/master/nn-vietnamese.ipynb Vietnamese notebook and am getting the file not found error at
get_wiki(path,lang)
This seems to be the case with any language. A manual check revealed that the text directory did not have an AA\wiki_00.
I don't know what the problem here is.
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\
@muralits98 hey, I'm encountering the same error. Did you manage to solve it?
okay i got it to work. I installed the package via pip and commented the line
if not (path/'wikiextractor').exists(): os.system('git clone https://github.com/attardi/wikiextractor.git')
in nlputils.py. I then changed the line os.system("python wikiextractor/WikiExtractor.py... to os.system("python -m wikiextractor.WikiExtractor .. and voila! Commenting it here incase anyone encounters the same problem.
@prats0599 I have done your suggestions, yet I have the same error: No such file or directory: '/root/.fastai/data/frwiki/text/AA/wiki_00' any other hints?
I also faced the same problem with ru language: No such file or directory: '/root/.fastai/data/ruwiki/text/AA/wiki_00' -> '/root/.fastai/data/ruwiki/ruwiki' Can anyone help?
Hi mates, this one should work. We need to update the options when call WikiExtractor at get_wiki(path,lang) function, in nlputils.py file:
From:
os.system("python wikiextractor/WikiExtractor.py --processes 4 --no_templates " + f"--min_text_length 1800 --filter_disambig_pages --log_file log -b 100G -q {xml_fn}")
To
os.system("python -m wikiextractor.wikiextractor.WikiExtractor --no-templates -b 100G -q " + f"{xml_fn}")
This is due to the argument update at https://github.com/attardi/wikiextractor/blob/master/wikiextractor/WikiExtractor.py For example: --no_templates change to --no-templates. Besides, other options (such as --min_text_length, --filter_disambig_pages, and --log_file) do not existed anymore.
I have make a PR at https://github.com/fastai/course-nlp/pull/55