llama_index cannot pickle 'builtins.CoreBPE' object

cannot pickle 'builtins.CoreBPE' object

Open vadimber opened this issue 1 year ago • 3 comments

The example Chatbot_SEC.ipynb fails with the following error (I just replaced gpt_index. with llama_index. for the imports):

ValidationError Traceback (most recent call last) /tmp/ipykernel_3445500/2033794993.py in <cell line: 22>() 20 ) 21 ---> 22 toolkit = LlamaToolkit( 23 index_configs=index_configs, 24 graph_configs=[graph_config]

/opt/conda/lib/python3.9/site-packages/pydantic/main.cpython-39-x86_64-linux-gnu.so in pydantic.main.BaseModel.init()

ValidationError: 5 validation errors for LlamaToolkit index_configs -> 0 cannot pickle 'builtins.CoreBPE' object (type=type_error) index_configs -> 1 cannot pickle 'builtins.CoreBPE' object (type=type_error) index_configs -> 2 cannot pickle 'builtins.CoreBPE' object (type=type_error) index_configs -> 3 cannot pickle 'builtins.CoreBPE' object (type=type_error) graph_configs -> 0 cannot pickle 'builtins.CoreBPE' object (type=type_error)

I tried it on two different environments, and the result was the same.

ChatGPT says that this is some package-level variable from HuggingFace...

Opening the pickle:

{'fail': IndexToolConfig(index=<llama_index.indices.vector_store.vector_indices.GPTSimpleVectorIndex object at 0x000001EAD60244F0>, name='Vector Index for Micha Josef Berdyczewski', description='useful for when you want to answer queries about the Micha Josef Berdyczewski', index_query_kwargs={'similarity_top_k': 3}, tool_kwargs={'return_direct': True}), 'err': TypeError("cannot pickle 'builtins.CoreBPE' object"), 'depth': 10, 'failing_children': [{'fail': <llama_index.indices.vector_store.vector_indices.GPTSimpleVectorIndex at 0x1ead60244f0>, 'err': TypeError("cannot pickle 'builtins.CoreBPE' object"), 'depth': 9, 'failing_children': [{'fail': <llama_index.embeddings.openai.OpenAIEmbedding at 0x1ead9935070>, 'err': TypeError("cannot pickle 'builtins.CoreBPE' object"), 'depth': 8, 'failing_children': [{'fail': <bound method Encoding.encode of <Encoding 'gpt2'>>, 'err': TypeError("cannot pickle 'builtins.CoreBPE' object"), 'depth': 7, 'failing_children': []}]}, {'fail': <llama_index.indices.prompt_helper.PromptHelper at 0x1ead930a0a0>, 'err': TypeError("cannot pickle 'builtins.CoreBPE' object"), 'depth': 8, 'failing_children': [{'fail': <bound method Encoding.encode of <Encoding 'gpt2'>>, 'err': TypeError("cannot pickle 'builtins.CoreBPE' object"), 'depth': 7, 'failing_children': []}]}, {'fail': <llama_index.langchain_helpers.text_splitter.TokenTextSplitter at 0x1ead9935220>, 'err': TypeError("cannot pickle 'builtins.CoreBPE' object"), 'depth': 8, 'failing_children': [{'fail': <bound method Encoding.encode of <Encoding 'gpt2'>>, 'err': TypeError("cannot pickle 'builtins.CoreBPE' object"), 'depth': 7, 'failing_children': []}]}]}]}

Mar 25 '23 03:03 vadimber

@vadimber, does the example work using the gpt-index imports?

Mar 25 '23 06:03 jerryjliu

As I noted at the start of the issue - I replaced gpt_index. with llama_index. for the imports

Mar 25 '23 13:03 vadimber

I just tried to install gpt-index-0.4.38 and to replace all llama_index with gpt_index - the result is precisely the same

Mar 25 '23 13:03 vadimber

This should be working with the latest version of llama-index (0.6.20). Going to close for now, feel free to re-open if needed

Jun 06 '23 03:06 logan-markewich

@logan-markewich I've faced the same issue here. I tried to load some indexes from documents in a parallel way using multiprocessing.Pool, and I'm facing the issue I cannot load the indexes due to this TypeError.

Don't know if it's something relevant to reopen the issue, since it's happening only when I try to multiprocess it.

I'm running on version 0.6.30

Jun 23 '23 14:06 LucasMallmann

@LucasMallmann yea multiprocessing needs to pickle the outputs of the function, but it looks like you can't pickle the vector store index object.

I don't think this will be easily fixable lol but I'm also not too familiar with the error happening here

Jun 23 '23 19:06 logan-markewich

I am getting this issue while trying to pickle the data agent. I am not sure if there is a different way to preserve an agent between api calls aside from pickling and restoring. If so I'd love to know about it.

agent = OpenAIAgent.from_tools(
    [
        *medical_spec.to_tool_list(),
        conversations_10k_tool,
    ],
    llm=llm,
    verbose=True,
)
# do some chatting with the agent
...
# then try to pickle the agent

Then I get TypeError: cannot pickle 'builtins.CoreBPE' object

Aug 08 '23 17:08 buckmaxwell

Related: https://github.com/jerryjliu/llama_index/issues/7169

Aug 09 '23 10:08 SlapDrone

llama_index llama_index copied to clipboard

cannot pickle 'builtins.CoreBPE' object

llama_index
llama_index copied to clipboard