ProteinNPT
ProteinNPT copied to clipboard
Error when running zero_shot_fitness_tranception.py: No such file or directory
Hi @pascalnotin
I encountered an error while running the zero_shot_fitness_tranception.py
through zero_shot_fitness_subs.sh
:
Traceback (most recent call last):
File "home/ProteinNPT-master/scripts/zero_shot_fitness_tranception.py", line 116, in <module>
main()
File "home/ProteinNPT-master/scripts/zero_shot_fitness_tranception.py", line 46, in main
tokenizer = get_tranception_tokenizer()
File "/root/miniconda3/envs/proteinnpt_env/lib/python3.10/site-packages/proteinnpt/utils/tranception/model_pytorch.py", line 915, in get_tranception_tokenizer
tokenizer = PreTrainedTokenizerFast(
File "/root/miniconda3/envs/proteinnpt_env/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 111, in __init__
fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
Exception: No such file or directory (os error 2)
The get_tranception_tokenizer
function is:
def get_tranception_tokenizer(tokenizer_path=None):
#Tranception Alphabet: "vocab":{"[UNK]":0,"[CLS]":1,"[SEP]":2,"[PAD]":3,"[MASK]":4,"A":5,"C":6,"D":7,"E":8,"F":9,"G":10,"H":11,"I":12,"K":13,"L":14,"M":15,"N":16,"P":17,"Q":18,"R":19,"S":20,"T":21,"V":22,"W":23,"Y":24}
if tokenizer_path is None:
dir_path = os.path.dirname(os.path.abspath(__file__))
tokenizer_path = os.path.join(dir_path, "utils", "tokenizers", "Basic_tokenizer")
print(tokenizer_path)
tokenizer = PreTrainedTokenizerFast(
tokenizer_file=tokenizer_path,
unk_token="[UNK]",
sep_token="[SEP]",
pad_token="[PAD]",
cls_token="[CLS]",
mask_token="[MASK]"
)
os.environ["TOKENIZERS_PARALLELISM"] = "false"
tokenizer.tok_to_idx = tokenizer.vocab
tokenizer.padding_idx = tokenizer.tok_to_idx["[PAD]"]
tokenizer.mask_idx = tokenizer.tok_to_idx["[MASK]"]
tokenizer.cls_idx = tokenizer.tok_to_idx["[CLS]"]
tokenizer.eos_idx = tokenizer.tok_to_idx["[SEP]"]
tokenizer.prepend_bos = True
tokenizer.append_eos = True
return tokenizer
The print
statement which I added prints /root/miniconda3/envs/proteinnpt_env/lib/python3.10/site-packages/proteinnpt/utils/tranception/utils/tokenizers/Basic_tokenizer
, while it does not exist in my environment.
Any suggestions for me or if there are any settings I need to correct? Thanks!