Langchain-Chatchat
Langchain-Chatchat copied to clipboard
[BUG] 配置为milvus向量库时报错,faiss时正常
初始化向量库时,用下面语句 python init_database.py --recreate-vs 报错: 2024-04-26 10:25:07,084 - lang.py[line:346] - WARNING: Need to load profiles. 2024-04-26 10:25:07,727 - common.py[line:591] - INFO: HTML element instance has no attribute type cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py) 文档切分示例:page_content='大模型技术栈-算法与原理\n\ntokenizer方法\nword-level\nchar-level\nsubword-level\nBPE\nWordPiece\nUniLM\nSentencePiece\nByteBPE\n\nposition encoding\n绝对位置编码\nROPE\nAliBi\n\n\n相对位置编码\nTransformer-XL\nT5/TUPE\nDeBERTa\n\n\n其他位置编码\n\n注意力机制\n稀疏注意力\nflash-attention' metadata={'source': '/home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/大模型技术栈-算法与原理.md'} 2024-04-26 10:25:37,804 - utils.py[line:295] - INFO: RapidOCRLoader used for /home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/img/大模型技术栈-算法与原理-幕布图片-19929-302935.jpg 正在将 samples/llm/大模型技术栈-算法与原理.md 添加到向量库,共包含56条文档 cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py) 文档切分示例:page_content='Multi-head\n\nGrouped-query\n\nMulti-query\n\nValues\n\nKeys\n\n00000000\n\nQueries' metadata={'source': '/home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/img/大模型推理优化策略-幕布图片-699343-219844.jpg'} 2024-04-26 10:25:37,837 - utils.py[line:295] - INFO: RapidOCRLoader used for /home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/img/分布式训练技术原理-幕布图片-906937-836104.jpg cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py) cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py) cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py) cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py) cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py) 文档切分示例:page_content='NVIDIA Megatron Trains LLM\n\nPipelineParallelism\n\nDevice 1\n\n101112\n\nLayer 1-4\n\nDevice 2\n\n9101112\n\n10\n\nLayer5-8\n\nDevice 3\n\n9101112\n\n13\n\n10\n\n11\n\nLayer 9-12\n\nDevice 4\n\n10\n\n10\n\n11\n\n11\n\n12\n\nLayer 13-16\n\nTime\n\nAssignmultiple stages\n\ntoeachdevice\n\nDevice 1' metadata={'source': '/home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/img/分布式训练技术原理-幕布图片-618350-869132.jpg'}
The error you're encountering with importing AutoTokenizer from the transformers package in your Python 3.11 environment is likely due to a compatibility issue. To resolve this, you should upgrade the transformers package to a version that supports Python 3.11. This can be done by running pip install transformers --upgrade. Ensure the upgraded version is compatible with your project's dependencies.
Upgrading transformers may affect other dependencies such as sentence_transformers, accelerate, and transformers_stream_generator, so it's important to check their compatibility with the new version of transformers and Python 3.11.
The AutoTokenizer is crucial for handling different models' tokenization in the Langchain-Chatchat project, which is why this issue directly impacts your ability to add documents to the vector database.
For reproducing and further investigating the error, refer to the test_milvus_db.py file in the tests directory of the Langchain-Chatchat repository. This file outlines steps for initializing a vector database with Milvus, adding documents, performing searches, and deleting documents, which could be helpful for troubleshooting.
If the problem persists after upgrading, please provide more detailed error messages and the context in which they occur to better diagnose and address the issue.
Details
Sources
- 向量数据库milvus报错
- [BUG] python init_database.py --recreate-vs时报错
- [BUG] 知识库管理使用milvus上传文档报错
- 运行 python init_database --recreate-vs 报错是什么原因?
- requirements.txt
- embeddings/add_embedding_keywords.py
- Dockerfile
- tests/kb_vector_db/test_milvus_db.py
About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
To continue the conversation, mention @dosu.
我也遇到了同样的问题,请问解决了吗
cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py) 建议uninstall,然后重新install transformers==?.
milvus用2.4版本
装4.37的transformer没问题