Langchain-Chatchat icon indicating copy to clipboard operation
Langchain-Chatchat copied to clipboard

[BUG] 配置为milvus向量库时报错,faiss时正常

Open Sgzmust opened this issue 1 year ago • 4 comments

初始化向量库时,用下面语句 python init_database.py --recreate-vs 报错: 2024-04-26 10:25:07,084 - lang.py[line:346] - WARNING: Need to load profiles. 2024-04-26 10:25:07,727 - common.py[line:591] - INFO: HTML element instance has no attribute type cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py) 文档切分示例:page_content='大模型技术栈-算法与原理\n\ntokenizer方法\nword-level\nchar-level\nsubword-level\nBPE\nWordPiece\nUniLM\nSentencePiece\nByteBPE\n\nposition encoding\n绝对位置编码\nROPE\nAliBi\n\n\n相对位置编码\nTransformer-XL\nT5/TUPE\nDeBERTa\n\n\n其他位置编码\n\n注意力机制\n稀疏注意力\nflash-attention' metadata={'source': '/home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/大模型技术栈-算法与原理.md'} 2024-04-26 10:25:37,804 - utils.py[line:295] - INFO: RapidOCRLoader used for /home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/img/大模型技术栈-算法与原理-幕布图片-19929-302935.jpg 正在将 samples/llm/大模型技术栈-算法与原理.md 添加到向量库,共包含56条文档 cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py) 文档切分示例:page_content='Multi-head\n\nGrouped-query\n\nMulti-query\n\nValues\n\nKeys\n\n00000000\n\nQueries' metadata={'source': '/home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/img/大模型推理优化策略-幕布图片-699343-219844.jpg'} 2024-04-26 10:25:37,837 - utils.py[line:295] - INFO: RapidOCRLoader used for /home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/img/分布式训练技术原理-幕布图片-906937-836104.jpg cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py) cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py) cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py) cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py) cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py) 文档切分示例:page_content='NVIDIA Megatron Trains LLM\n\nPipelineParallelism\n\nDevice 1\n\n101112\n\nLayer 1-4\n\nDevice 2\n\n9101112\n\n10\n\nLayer5-8\n\nDevice 3\n\n9101112\n\n13\n\n10\n\n11\n\nLayer 9-12\n\nDevice 4\n\n10\n\n10\n\n11\n\n11\n\n12\n\nLayer 13-16\n\nTime\n\nAssignmultiple stages\n\ntoeachdevice\n\nDevice 1' metadata={'source': '/home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/img/分布式训练技术原理-幕布图片-618350-869132.jpg'}

Sgzmust avatar Apr 26 '24 10:04 Sgzmust

The error you're encountering with importing AutoTokenizer from the transformers package in your Python 3.11 environment is likely due to a compatibility issue. To resolve this, you should upgrade the transformers package to a version that supports Python 3.11. This can be done by running pip install transformers --upgrade. Ensure the upgraded version is compatible with your project's dependencies.

Upgrading transformers may affect other dependencies such as sentence_transformers, accelerate, and transformers_stream_generator, so it's important to check their compatibility with the new version of transformers and Python 3.11.

The AutoTokenizer is crucial for handling different models' tokenization in the Langchain-Chatchat project, which is why this issue directly impacts your ability to add documents to the vector database.

For reproducing and further investigating the error, refer to the test_milvus_db.py file in the tests directory of the Langchain-Chatchat repository. This file outlines steps for initializing a vector database with Milvus, adding documents, performing searches, and deleting documents, which could be helpful for troubleshooting.

If the problem persists after upgrading, please provide more detailed error messages and the context in which they occur to better diagnose and address the issue.

Details

Sources


About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

dosubot[bot] avatar Apr 26 '24 10:04 dosubot[bot]

我也遇到了同样的问题,请问解决了吗

jiusi9 avatar Apr 29 '24 04:04 jiusi9

cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py) 建议uninstall,然后重新install transformers==?.

hanwsf avatar Apr 29 '24 19:04 hanwsf

milvus用2.4版本

hanwsf avatar Apr 29 '24 19:04 hanwsf

装4.37的transformer没问题

zRzRzRzRzRzRzR avatar May 05 '24 04:05 zRzRzRzRzRzRzR