langchain
langchain copied to clipboard
DOC: Code/twitter-the-algorithm-analysis-deeplake not working as written
Issue with current documentation:
I followed the documentation @ https://python.langchain.com/docs/use_cases/code/twitter-the-algorithm-analysis-deeplake.
I replaced 'twitter-the-algorithm' with another code base I'm analyzing and used my own credentials from OpenAI and Deep Lake.
When I run the code (on VS Code for Mac with M1 chip), I get the following error:
_ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (1435,) + inhomogeneous part.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/catherineswope/Desktop/LangChain/fromLangChain.py", line 37, in
This is the code snippet from my actual code:
import os import getpass
from langchain.embeddings.openai import OpenAIEmbeddings from langchain.vectorstores import DeepLake from langchain.document_loaders import TextLoader
#get OPENAI API KEY and ACTIVELOOP_TOKEN os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:") os.environ["ACTIVELOOP_TOKEN"] = getpass.getpass("Activeloop Token:")
embeddings = OpenAIEmbeddings(disallowed_special=())
#clone from chattydocs git hub repo removedcomments branch and copy/paste path root_dir = "/Users/catherineswope/chattydocs/incubator-baremaps-0.7.1-removedcomments" docs = [] for dirpath, dirnames, filenames in os.walk(root_dir): for file in filenames: try: loader = TextLoader(os.path.join(dirpath, file), encoding="utf-8") docs.extend(loader.load_and_split()) except Exception as e: pass
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) texts = text_splitter.split_documents(docs)
username = "caswvu" # replace with your username from app.activeloop.ai db = DeepLake( dataset_path=f"hub://caswvu/baremaps", embedding_function=embeddings, ) db.add_documents(texts)
db = DeepLake( dataset_path="hub://caswvu/baremaps", read_only=True, embedding_function=embeddings, )
retriever = db.as_retriever() retriever.search_kwargs["distance_metric"] = "cos" retriever.search_kwargs["fetch_k"] = 100 retriever.search_kwargs["maximal_marginal_relevance"] = True retriever.search_kwargs["k"] = 10
from langchain.chat_models import ChatOpenAI from langchain.chains import ConversationalRetrievalChain
model = ChatOpenAI(model_name="gpt-3.5-turbo") # switch to 'gpt-4' qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)
questions = [ "What does this code do?", ] chat_history = []
for question in questions: result = qa({"question": question, "chat_history": chat_history}) chat_history.append((question, result["answer"])) print(f"-> Question: {question} \n") print(f"Answer: {result['answer']} \n")
Idea or request for content:
Can you please help me understand how to fix the code to address the error message? Also, if applicable, address in the documentation so that others can avoid as well. Thank you!
@casWVU What version of LangChain are you on?
Hi. I'm on
Name: langchain
Version: 0.0.223
Thanks!
On Sun, Jul 9, 2023 at 12:04 PM Devin Stein @.***> wrote:
@casWVU https://github.com/casWVU What version of LangChain are you on?
— Reply to this email directly, view it on GitHub https://github.com/hwchase17/langchain/issues/7435#issuecomment-1627759343, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3YSMTRYHKLMXD22OLAKVPTXPLJAXANCNFSM6AAAAAA2DSOAYI . You are receiving this because you were mentioned.Message ID: @.***>
Hey @casWVU! what DeepLake version are you using? This problem is related to the documents stored in the folder. Could you pls filter the files that you don't use. When files of unsupported format comes inside of the OpenAI embedding it sends back an empty list. Appending this empty list is causing the issue. In the newer version of the deeplake, the exception should provide you more details, but overall that's the issue.
Hi! I'm on the latest version:
Name: deeplake
Version: 3.6.8
Thanks so much for the insight. Do you know which file types aren't supported by OpenAI embeddings? I'm reading OpenAI documentation and searching the web but not finding anything.
On Tue, Jul 11, 2023 at 4:07 AM Adilkhan Sarsen @.***> wrote:
Hey @casWVU https://github.com/casWVU! what DeepLake version are you using? This problem is related to the documents stored in the folder. Could you pls filter the files that you don't use. When files of unsupported format comes inside of the OpenAI embedding it sends back an empty list. Appending this empty list is causing the issue. In the newer version of the deeplake, the exception should provide you more details, but overall that's the issue.
— Reply to this email directly, view it on GitHub https://github.com/hwchase17/langchain/issues/7435#issuecomment-1630351570, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3YSMTVLBRIMQBKFXADETWTXPUCVBANCNFSM6AAAAAA2DSOAYI . You are receiving this because you were mentioned.Message ID: @.***>
Not sure which files are not supported, but I had faced similar issues before. I would suggest excluding files that doesn't bring any context info to the model, let's say like: .lock
or .DS_Store
Btw I saw that LangChain has updated the open ai related code, so now it should raise this kind of exception:
raise openai.error.APIError("OpenAI API returned an empty embedding")
openai.error.APIError: OpenAI API returned an empty embedding
Just update langchain in you repo till the latest version
Thanks, I'll check it out.
On Wed, Jul 12, 2023 at 3:20 AM Adilkhan Sarsen @.***> wrote:
But I saw that LangChain has updated the open ai related code, so now it should raise this kind of exception:
raise openai.error.APIError("OpenAI API returned an empty embedding")
openai.error.APIError: OpenAI API returned an empty embedding
Just update langchain in you repo till the latest version
— Reply to this email directly, view it on GitHub https://github.com/hwchase17/langchain/issues/7435#issuecomment-1631982346, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3YSMTXSP7OKVDH4OLSTADLXPZF5XANCNFSM6AAAAAA2DSOAYI . You are receiving this because you were mentioned.Message ID: @.***>
Hi, @casWVU! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
Based on my understanding, you were experiencing an error when running the code provided in the documentation. It seems that the error message indicated an issue with the shape of the embeddings returned by the embedding function. You received assistance from "devstein" and "adolkhan" who asked for the versions of "LangChain" and "DeepLake" being used. "adolkhan" suggested filtering out unsupported file types and updating "LangChain" to the latest version, which now raises a more informative exception.
Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain project!