multifit
multifit copied to clipboard
File exists but it doesn't found it!!!!!
When I execute a python script via jupyter notebook I recieve the following error:
~/miniconda3/lib/python3.7/site-packages/fastai/text/data.py in train_sentencepiece(texts, path, pre_rules, post_rules, vocab_sz, max_vocab_sz, model_type, max_sentence_len, lang, char_coverage, tmp_dir, enc)
434 f"--unk_id={len(defaults.text_spec_tok)} --pad_id=-1 --bos_id=-1 --eos_id=-1",
435 f"--user_defined_symbols={','.join(spec_tokens)}",
--> 436 f"--model_prefix={quotemark}{cache_dir/'spm'}{quotemark} --vocab_size={vocab_sz} --model_type={model_type}"]))
437 raw_text_path.unlink()
438 return cache_dir
OSError: Not found: ""/home/pouramini/mf1/data/wiki/fa-2/models/fsp15k/all_text.out"": No such file or directory Error #2
However, the file exists! I wonder why it shows the path in two double quote?!
This is the code where the error raises, it looks for raw_text_path
:
raw_text_path = cache_dir + '/all_text.out'
with open(raw_text_path, 'w', encoding=enc) as f: f.write("\n".join(texts))
spec_tokens = ['\u2581'+s for s in defaults.text_spec_tok]
SentencePieceTrainer.Train(" ".join([
f'--input={raw_text_path} --max_sentence_length={max_sentence_len}',
f"--character_coverage={ifnone(char_coverage, 0.99999 if lang in full_char_coverage_langs else 0.9998)}",
f"--unk_id={len(defaults.text_spec_tok)} --pad_id=-1 --bos_id=-1 --eos_id=-1",
f"--user_defined_symbols={','.join(spec_tokens)}",
f"--model_prefix={cache_dir/'spm'} --vocab_size={vocab_sz} --model_type={model_type}"]))
raw_text_path.unlink()
The problem is related to fastai or sentencepiece versions... What happens instead is that 'tmp' folder is created along with files named "cache_dir".vocab and "cache_dir".model inside my current directory.
For a solution you can refer to :
https://stackoverflow.com/questions/59788395/fastai-failed-initiation-of-language-model-in-sentence-piece-processor-cache?noredirect=1#comment110726963_59788395