DNABERT
DNABERT copied to clipboard
Model name 'dna6' was not found in tokenizers model name list
Hi there,
I am running the DNABERT run_finetune.py as instructed by the readme file. It works well at my workstation, but when I run the same code on the server, it reports the following error:
OSError: Model name 'dna6' was not found in tokenizers model name list (dna3, dna4, dna5, dna6). We assumed 'dna6' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.
I wonder why "Model name 'dna6' was not found in tokenizers model name list (dna3, dna4, dna5, dna6)"? It seems so strange, because dna6 is definitely in the list.
Thanks for the answer!
I have the exactly the same problem, runs fine on workstation but not on the server and gives the same error!
I think I've figured out the issue! For me the error was message was generated here and was coming from the loading of the vocab files. By default, vocab_files
are in fact links to the files and the server would not allow me to download files from running code. If you download the vocab files separately, and then provide path to the file instead of the dna6
it seems to work!
It works!Thank you very much!
Hello @ksenia007 and @BioSenior 👍
Thank you for sharing. I have the same error message when I ran 'python run_pretrain.py' followed in README. OSError: Model name 'dna6' was not found in tokenizers model name list (dna3, dna4, dna5, dna6).
If I search vocab file in my DNABERT directory, I have the followings: [ ~DNABERT]$ find . -name vocab* ./examples/ft/6/vocab.txt ./examples/ft/6/pre/vocab.txt ./examples/ft/6/pre_2_old/vocab.txt ./examples/ft/6-bk/vocab.txt ./src/transformers/dnabert-config/bert-config-6/vocab.txt ./src/transformers/dnabert-config/bert-config-4/vocab.txt ./src/transformers/dnabert-config/bert-config-5/vocab.txt ./src/transformers/dnabert-config/bert-config-3/vocab.txt
May you please advise what to change in the commands to go through this error?
@ryao-mdanderson I am not sure if you have the same problem. However, I believe that if you specify just dna6
as a tokenizer name, it tries to load vocab.txt
from these links and not access files from the source folder. For me, I downloaded the vocab.txt
file into my data
folder using wget
, and then in tokenizer_name
just passed path/to/directory/vocab.txt
.
Sorry if that does not help in your case!
@ksenia007 👍
Thank you very much. I got it. since I am running the code in a compute cluster node which does not have internet access, I followed your suggestion, change in tokenizer_name by passing the path/to/directory/vocab.txt. It works now.
@ksenia007, @ryao-mdanderson, @jerryji1993 I am getting the same error using run_pretrain.py script and I tried the same solution but didnt work at all. The error is given below:
<class 'transformers.tokenization_dna.DNATokenizer'>
Traceback (most recent call last):
File "examples/run_pretrain.py", line 885, in
Can you please help me regarding this?