step_into_llm icon indicating copy to clipboard operation
step_into_llm copied to clipboard

ChatPDF功能存在的问题运行 simple_ui.py直接报错。

Open kevinwei1975 opened this issue 6 months ago • 0 comments

运行python simple_ui.py直接报如下错误。 (mindspore) root@autodl-container-bff2469f3e-a4796232:~/autodl-tmp/ChatPDF# python simple_ui.py Building prefix dict from the default dictionary ... Loading model from cache /tmp/jieba.cache Loading model cost 0.559 seconds. Prefix dict has been built successfully. 2024-08-09 09:59:49.937 | INFO | main::32 - Namespace(gen_model_type='auto', gen_model_name='./.mindnlp/model/01ai/Yi-6B-Chat', lora_model=None, rerank_model_name=None, corpus_files='sample.pdf', int4=False, int8=False, chunk_size=220, chunk_overlap=0, num_expand_context_chunk=1, server_name='0.0.0.0', server_port=8082, share=False) The following parameters in checkpoint files are not loaded: ['embeddings.position_ids'] Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]MindSpore do not support bfloat16 dtype, we will automaticlly convert to float16 Loading checkpoint shards: 100%|████████████████████████████████████████████████████| 3/3 [00:18<00:00, 6.14s/it] 2024-08-09 10:00:12.427 | INFO | msimilarities.bert_similarity:add_corpus:105 - Start computing corpus embeddings, new docs: 212 Batches: 100%|██████████████████████████████████████████████████████████████████████| 7/7 [00:16<00:00, 2.39s/it] 2024-08-09 10:00:29.184 | INFO | msimilarities.bert_similarity:add_corpus:117 - Add 212 docs, total: 212, emb len: 212 2024-08-09 10:00:29.185 | INFO | msimilarities.literal_similarity:add_corpus:395 - Add corpus done, new docs: 212, all corpus size: 212 2024-08-09 10:00:29.336 | INFO | msimilarities.literal_similarity:build_bm25:405 - Total corpus: 212 2024-08-09 10:00:29.336 | DEBUG | chatpdf:add_corpus:281 - files: ['sample.pdf'], corpus size: 212, top3: ['Style Transfer from Non-Parallel Text byCross-AlignmentTianxiao Shen1Tao Lei2Regina Barzilay1Tommi Jaakkola11MIT CSAIL2ASAPP Inc.', '1{tianxiao, regina, tommi}@[email protected] paper focuses on style transfer on the basis of non-parallel text.', 'This is aninstance of a broad family of problems including machine translation, decipherment,and sentiment modification. The key challenge is to separate the content fromother aspects such as style.'] Traceback (most recent call last): File "/root/autodl-tmp/ChatPDF/simple_ui.py", line 34, in model = ChatPDF( File "/root/autodl-tmp/ChatPDF/chatpdf.py", line 184, in init self.rerank_tokenizer = AutoTokenizer.from_pretrained(rerank_model_name_or_path, mirror='modelscope') File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/models/auto/tokenization_auto.py", line 775, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/tokenization_utils_base.py", line 1723, in from_pretrained return cls._from_pretrained( File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/tokenization_utils_base.py", line 1942, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py", line 154, in init super().init( File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/tokenization_utils_fast.py", line 106, in init fast_tokenizer = convert_slow_tokenizer(slow_tokenizer) File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/convert_slow_tokenizer.py", line 1388, in convert_slow_tokenizer return converter_class(transformer_tokenizer).converted() File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/convert_slow_tokenizer.py", line 533, in converted pre_tokenizer = self.pre_tokenizer(replacement, add_prefix_space) File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/convert_slow_tokenizer.py", line 515, in pre_tokenizer return pre_tokenizers.Metaspace(replacement=replacement, add_prefix_space=add_prefix_space)

kevinwei1975 avatar Aug 09 '24 08:08 kevinwei1975