运行python simple_ui.py直接报如下错误。
(mindspore) root@autodl-container-bff2469f3e-a4796232:~/autodl-tmp/ChatPDF# python simple_ui.py
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.559 seconds.
Prefix dict has been built successfully.
2024-08-09 09:59:49.937 | INFO | main::32 - Namespace(gen_model_type='auto', gen_model_name='./.mindnlp/model/01ai/Yi-6B-Chat', lora_model=None, rerank_model_name=None, corpus_files='sample.pdf', int4=False, int8=False, chunk_size=220, chunk_overlap=0, num_expand_context_chunk=1, server_name='0.0.0.0', server_port=8082, share=False)
The following parameters in checkpoint files are not loaded:
['embeddings.position_ids']
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]MindSpore do not support bfloat16 dtype, we will automaticlly convert to float16
Loading checkpoint shards: 100%|████████████████████████████████████████████████████| 3/3 [00:18<00:00, 6.14s/it]
2024-08-09 10:00:12.427 | INFO | msimilarities.bert_similarity:add_corpus:105 - Start computing corpus embeddings, new docs: 212
Batches: 100%|██████████████████████████████████████████████████████████████████████| 7/7 [00:16<00:00, 2.39s/it]
2024-08-09 10:00:29.184 | INFO | msimilarities.bert_similarity:add_corpus:117 - Add 212 docs, total: 212, emb len: 212
2024-08-09 10:00:29.185 | INFO | msimilarities.literal_similarity:add_corpus:395 - Add corpus done, new docs: 212, all corpus size: 212
2024-08-09 10:00:29.336 | INFO | msimilarities.literal_similarity:build_bm25:405 - Total corpus: 212
2024-08-09 10:00:29.336 | DEBUG | chatpdf:add_corpus:281 - files: ['sample.pdf'], corpus size: 212, top3: ['Style Transfer from Non-Parallel Text byCross-AlignmentTianxiao Shen1Tao Lei2Regina Barzilay1Tommi Jaakkola11MIT CSAIL2ASAPP Inc.', '1{tianxiao, regina, tommi}@[email protected] paper focuses on style transfer on the basis of non-parallel text.', 'This is aninstance of a broad family of problems including machine translation, decipherment,and sentiment modification. The key challenge is to separate the content fromother aspects such as style.']
Traceback (most recent call last):
File "/root/autodl-tmp/ChatPDF/simple_ui.py", line 34, in
model = ChatPDF(
File "/root/autodl-tmp/ChatPDF/chatpdf.py", line 184, in init
self.rerank_tokenizer = AutoTokenizer.from_pretrained(rerank_model_name_or_path, mirror='modelscope')
File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/models/auto/tokenization_auto.py", line 775, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/tokenization_utils_base.py", line 1723, in from_pretrained
return cls._from_pretrained(
File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/tokenization_utils_base.py", line 1942, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py", line 154, in init
super().init(
File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/tokenization_utils_fast.py", line 106, in init
fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/convert_slow_tokenizer.py", line 1388, in convert_slow_tokenizer
return converter_class(transformer_tokenizer).converted()
File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/convert_slow_tokenizer.py", line 533, in converted
pre_tokenizer = self.pre_tokenizer(replacement, add_prefix_space)
File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/convert_slow_tokenizer.py", line 515, in pre_tokenizer
return pre_tokenizers.Metaspace(replacement=replacement, add_prefix_space=add_prefix_space)