Get ValueError: not enough values to unpack (expected 2, got 1)
Whit the function VectorstoreIndexCreator, I got the error at --> 115 return { 116 base64.b64decode(token): int(rank) 117 for token, rank in (line.split() for line in contents.splitlines() if line) 118 } The whole error information was: ValueError Traceback (most recent call last) Cell In[25], line 2 1 from langchain.indexes import VectorstoreIndexCreator ----> 2 index = VectorstoreIndexCreator().from_loaders([loader])
File J:\conda202002\envs\chatglm\lib\site-packages\langchain\indexes\vectorstore.py:71, in VectorstoreIndexCreator.from_loaders(self, loaders) 69 docs.extend(loader.load()) 70 sub_docs = self.text_splitter.split_documents(docs) ---> 71 vectorstore = self.vectorstore_cls.from_documents( 72 sub_docs, self.embedding, **self.vectorstore_kwargs 73 ) 74 return VectorStoreIndexWrapper(vectorstore=vectorstore)
File J:\conda202002\envs\chatglm\lib\site-packages\langchain\vectorstores\chroma.py:347, in Chroma.from_documents(cls, documents, embedding, ids, collection_name, persist_directory, client_settings, client, **kwargs) 345 texts = [doc.page_content for doc in documents] 346 metadatas = [doc.metadata for doc in documents] --> 347 return cls.from_texts( 348 texts=texts, 349 embedding=embedding, 350 metadatas=metadatas, 351 ids=ids, 352 collection_name=collection_name, 353 persist_directory=persist_directory, 354 client_settings=client_settings, 355 client=client, 356 )
File J:\conda202002\envs\chatglm\lib\site-packages\langchain\vectorstores\chroma.py:315, in Chroma.from_texts(cls, texts, embedding, metadatas, ids, collection_name, persist_directory, client_settings, client, **kwargs) 291 """Create a Chroma vectorstore from a raw documents. 292 293 If a persist_directory is specified, the collection will be persisted there. (...) 306 Chroma: Chroma vectorstore. 307 """ 308 chroma_collection = cls( 309 collection_name=collection_name, 310 embedding_function=embedding, (...) 313 client=client, 314 ) --> 315 chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids) 316 return chroma_collection
File J:\conda202002\envs\chatglm\lib\site-packages\langchain\vectorstores\chroma.py:121, in Chroma.add_texts(self, texts, metadatas, ids, **kwargs) 119 embeddings = None 120 if self._embedding_function is not None: --> 121 embeddings = self._embedding_function.embed_documents(list(texts)) 122 self._collection.add( 123 metadatas=metadatas, embeddings=embeddings, documents=texts, ids=ids 124 ) 125 return ids
File J:\conda202002\envs\chatglm\lib\site-packages\langchain\embeddings\openai.py:228, in OpenAIEmbeddings.embed_documents(self, texts, chunk_size) 226 # handle batches of large input text 227 if self.embedding_ctx_length > 0: --> 228 return self._get_len_safe_embeddings(texts, engine=self.deployment) 229 else: 230 results = []
File J:\conda202002\envs\chatglm\lib\site-packages\langchain\embeddings\openai.py:159, in OpenAIEmbeddings._get_len_safe_embeddings(self, texts, engine, chunk_size) 157 tokens = [] 158 indices = [] --> 159 encoding = tiktoken.model.encoding_for_model(self.model) 160 for i, text in enumerate(texts): 161 # replace newlines, which can negatively affect performance. 162 text = text.replace("\n", " ")
File J:\conda202002\envs\chatglm\lib\site-packages\tiktoken\model.py:75, in encoding_for_model(model_name)
69 if encoding_name is None:
70 raise KeyError(
71 f"Could not automatically map {model_name} to a tokeniser. "
72 "Please use tiktok.get_encoding to explicitly get the tokeniser you expect."
73 ) from None
---> 75 return get_encoding(encoding_name)
File J:\conda202002\envs\chatglm\lib\site-packages\tiktoken\registry.py:63, in get_encoding(encoding_name) 60 raise ValueError(f"Unknown encoding {encoding_name}") 62 constructor = ENCODING_CONSTRUCTORS[encoding_name] ---> 63 enc = Encoding(**constructor()) 64 ENCODINGS[encoding_name] = enc 65 return enc
File J:\conda202002\envs\chatglm\lib\site-packages\tiktoken_ext\openai_public.py:64, in cl100k_base() 63 def cl100k_base(): ---> 64 mergeable_ranks = load_tiktoken_bpe( 65 "https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken" 66 ) 67 special_tokens = { 68 ENDOFTEXT: 100257, 69 FIM_PREFIX: 100258, (...) 72 ENDOFPROMPT: 100276, 73 } 74 return { 75 "name": "cl100k_base", 76 "pat_str": r"""(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]|\s[\r\n]+|\s+(?!\S)|\s+""", 77 "mergeable_ranks": mergeable_ranks, 78 "special_tokens": special_tokens, 79 }
File J:\conda202002\envs\chatglm\lib\site-packages\tiktoken\load.py:115, in load_tiktoken_bpe(tiktoken_bpe_file) 112 def load_tiktoken_bpe(tiktoken_bpe_file: str) -> dict[bytes, int]: 113 # NB: do not add caching to this function 114 contents = read_file_cached(tiktoken_bpe_file) --> 115 return { 116 base64.b64decode(token): int(rank) 117 for token, rank in (line.split() for line in contents.splitlines() if line) 118 }
File J:\conda202002\envs\chatglm\lib\site-packages\tiktoken\load.py:115, in
ValueError: not enough values to unpack (expected 2, got 1)
This error occurs when the in-built function expects 2 arguments but you only gave 1 argument.
This error occurs when the in-built function expects 2 arguments but you only gave 1 argument.
Thanks a lot but this occurred in the example code, I didn't change the code anymore.
The example codes are: https://python.langchain.com/en/latest/modules/indexes/getting_started.html?highlight=encoding#getting-started
Next in the generic setup, letβs specify the document loader we want to use. You can download the state_of_the_union.txt file [here](https://github.com/hwchase17/langchain/blob/master/docs/modules/state_of_the_union.txt)
from langchain.document_loaders import TextLoader
loader = TextLoader('../state_of_the_union.txt', encoding='utf8')
I noticed another gay had the same problem as me, he resolved it by reinstalling the lib, but he didn't show which lib should be reinstalled. How can I check the lib that should be reinstalled?
the links: https://qiita.com/KONTA2019/items/8ec10b8bf4dfca3bfb75
I solved the problem, using the
pip install langchain==0.0.125 openai==0.27.2 chromadb==0.3.14 pypdf==3.7.0 tiktoken==0.3.3 gradio==3.23
this combine worked well on my computer conda env.
Did you manage to solve it?
I noticed another gay had the same problem as me, he resolved it by reinstalling the lib, but he didn't show which lib should be reinstalled. How can I check the lib that should be reinstalled?
the links: https://qiita.com/KONTA2019/items/8ec10b8bf4dfca3bfb75
use these: tiktoken==0.3.3 chromadb==0.3.14
I have the same problem in this code:
from langchain.embeddings.openai import OpenAIEmbeddings
embedd = OpenAIEmbeddings(model='text-embedding-ada-002')
embeddings.embed_documents([texts[0]])
Has anyone solved it ???
Hi, @judgementc! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, you encountered a ValueError in the VectorstoreIndexCreator function, specifically at line 115. The error message indicated that there weren't enough values to unpack. You mentioned that you didn't make any changes to the code and provided an example where the error occurred. However, you later resolved the issue by reinstalling the required libraries. It seems that other users have also reported similar issues with different code snippets.
Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain project. If you have any further questions or concerns, please don't hesitate to reach out.