crewAI
crewAI copied to clipboard
ollama not in embedder provider list
I am trying to write a simple pdf agent which would answer questions on the basis of pdf knowledge
app.py
llm = Ollama(base_url = url,model=model,num_gpu=2)
rag_tool = PDFSearchTool(
pdf = r'pdf_path',
config=dict(
llm=dict(
provider="ollama", # or google, openai, anthropic, llama2, ...
config=dict(
model="gemma",
# temperature=0.5,
# top_p=1,
# stream=true,
),
),
embedder=dict(
provider="ollama",
config=dict(
model="nomic-embed-text",
task_type="retrieval_document",
# title="Embeddings",
),
),
)
)
auditor_agent = Agent(
role = 'Data Analyst',
goal = 'You perfectly know how to analyze any data using provided txt file and searching info via RAG tool',
Background = 'You are data expert',
verbose = True,
allow_delegation = False,
tools = [rag_tool]
)
task = Task(
description = "what is the latest status of ₹2000 bank notes",
tools = [rag_tool],
agent = auditor_agent,
expected_output = '''The output should be in following format :
Format:
Word Limit : 25
Writing style : simple and logical
'''
)
task1 = task.execute()
print(task1.output())
error:
schema.SchemaError: Key 'embedder' error:
Key 'provider' error:
Or('openai', 'gpt4all', 'huggingface', 'vertexai', 'azure_openai', 'google', 'mistralai', 'nvidia') did not validate 'ollama'
'openai' does not match 'ollama'
'gpt4all' does not match 'ollama'
'huggingface' does not match 'ollama'
'vertexai' does not match 'ollama'
'azure_openai' does not match 'ollama'
'google' does not match 'ollama'
'mistralai' does not match 'ollama'
'nvidia' does not match 'ollama'
While it's not Ollama, you can run a local embedder by using the HugginFace provider. Here is an example.
test_crew = Crew(
agents=[reader, writer],
tasks=[read_book, write_report],
process=Process.sequential,
cache=True,
verbose=2,
memory=True,
embedder={
"provider": "huggingface",
"config": {
"model": "mixedbread-ai/mxbai-embed-large-v1", # https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1
}
}
)
@fubz Good idea. Would this route still be private, meaning no one else could access the data? Would I need to clone my own instance under my own HugFace account?
Ollama can be used for PdfSearchTool with
PDFSearchTool = PDFSearchTool(pdf=pdf_file_path,
config=dict(
llm=dict(
provider="ollama", # or google, openai, anthropic, llama2, ...
config=dict(
model="llama3:8b-instruct-q6_K",
base_url="http://ollama_server_ip:11434",
),
),
embedder=dict(
provider="ollama",
config=dict(
model="mxbai-embed-large:latest",
base_url="http://192.168.42.173:11434",
),
),
)
)
and
pip install -U embedchain==0.1.103
But when you install this new version of embedchain an error occured
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
crewai-tools 0.2.3 requires chromadb<0.5.0,>=0.4.22, but you have chromadb 0.5.0 which is incompatible.
@joaomdmoura may be a little update in crewai-tools to enable chromadb 0.5.0 ? :-)
Hello, I'm curious if anyone has experience using the PdfSearchTool alongside Groq as both the provider and embedder. I'm exploring this combination and would appreciate any insights or tips anyone might have.
I am trying to write a simple pdf agent which would answer questions on the basis of pdf knowledge
app.py
llm = Ollama(base_url = url,model=model,num_gpu=2) rag_tool = PDFSearchTool( pdf = r'pdf_path', config=dict( llm=dict( provider="ollama", # or google, openai, anthropic, llama2, ... config=dict( model="gemma", # temperature=0.5, # top_p=1, # stream=true, ), ), embedder=dict( provider="ollama", config=dict( model="nomic-embed-text", task_type="retrieval_document", # title="Embeddings", ), ), ) ) auditor_agent = Agent( role = 'Data Analyst', goal = 'You perfectly know how to analyze any data using provided txt file and searching info via RAG tool', Background = 'You are data expert', verbose = True, allow_delegation = False, tools = [rag_tool] ) task = Task( description = "what is the latest status of ₹2000 bank notes", tools = [rag_tool], agent = auditor_agent, expected_output = '''The output should be in following format : Format: Word Limit : 25 Writing style : simple and logical ''' ) task1 = task.execute() print(task1.output())
error:
schema.SchemaError: Key 'embedder' error: Key 'provider' error: Or('openai', 'gpt4all', 'huggingface', 'vertexai', 'azure_openai', 'google', 'mistralai', 'nvidia') did not validate 'ollama' 'openai' does not match 'ollama' 'gpt4all' does not match 'ollama' 'huggingface' does not match 'ollama' 'vertexai' does not match 'ollama' 'azure_openai' does not match 'ollama' 'google' does not match 'ollama' 'mistralai' does not match 'ollama' 'nvidia' does not match 'ollama'
@punitchauhan771, Langchain currently does not support ollama as an embedding provider. The reason, probably, is that ollama currenlty does not have an openai compatble (/v1) embedding endoint.
Hello, I'm curious if anyone has experience using the PdfSearchTool alongside Groq as both the provider and embedder. I'm exploring this combination and would appreciate any insights or tips anyone might have.
@Timilla, Langchain currently does not support groq as an embedding provider. The reason, probably, is that groq does not host embedding models.
@punitchauhan771, Langchain currently does not support ollama as an embedding provider. The reason, probably, is that ollama currenlty does not have an openai compatble (/v1) embedding endoint.
Hi, I think langchain supports ollama as an embeddings provider
https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.ollama.OllamaEmbeddings.html
Also I think this is issue is likely due to embedchain module used in crew ai for embeddings, in previous versions it didn't had ollama as a provider.
As I write earlier the last version of embedchain (0.1.103) is compatible with ollama and need upgrade chromadb to 0.5.0. But crewai-tools need chromadb < 0.5.0 so we must wait @joaomdmoura to upgrade the requirements of crewai-tools :-)
Already updated on the new RC 0.30.0rc5
will probably push it live over the weekend / monday
I must have missed something :-) @joaomdmoura
pip install -U crewai[tools]==0.30.0rc5
.....
Requirement already satisfied: pycparser in ./.local/lib/python3.10/site-packages (from cffi>=1.4.1->pynacl>=1.4.0->PyGithub<2.0.0,>=1.59.1->embedchain<0.2.0,>=0.1.98->crewai[tools]==0.30.0rc5) (2.21)
Installing collected packages: crewai, crewai-tools
Attempting uninstall: crewai
Found existing installation: crewai 0.28.8
Uninstalling crewai-0.28.8:
Successfully uninstalled crewai-0.28.8
Attempting uninstall: crewai-tools
Found existing installation: crewai-tools 0.1.7
Uninstalling crewai-tools-0.1.7:
Successfully uninstalled crewai-tools-0.1.7
Successfully installed **crewai-0.30.0rc5 crewai-tools-0.2.3**
After => pip install -U embedchain==0.1.103 (to have ollama in embedings)
....
Installing collected packages: pypdf, chromadb, embedchain
Attempting uninstall: pypdf
Found existing installation: pypdf 3.17.4
Uninstalling pypdf-3.17.4:
Successfully uninstalled pypdf-3.17.4
Attempting uninstall: chromadb
Found existing installation: chromadb 0.4.23
Uninstalling chromadb-0.4.23:
Successfully uninstalled chromadb-0.4.23
Attempting uninstall: embedchain
Found existing installation: embedchain 0.1.102
Uninstalling embedchain-0.1.102:
Successfully uninstalled embedchain-0.1.102
**ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
crewai-tools 0.2.3 requires chromadb<0.5.0,>=0.4.22, but you have chromadb 0.5.0 which is incompatible.**
Successfully installed chromadb-0.5.0 embedchain-0.1.103 pypdf-4.2.0
Same here: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. crewai-tools 0.2.3 requires chromadb<0.5.0,>=0.4.22, but you have chromadb 0.5.0 which is incompatible.
oh ! sorry, looking into that
If we are able to get Ollama supported in the embeddings provider, it might also help solve some general failures with local LLM tool usage and memory.
There might be a need for clearer exception messages when embedding-requests fail while using tools. I noticed this when using the WebsiteSearchTool via openhermes; it requests embeddings from the ollama server using the unsupported openAI endpoint at /v1/embeddings/
. The ollama server returns a 404 and I guess an error is shown, but it I thought it was related to the website, not the request for embeddings :)
The action then reliably results in a loop of errors:
I encountered an error while trying to use the tool. This was the error: 404 page not found.
Tool Search in a specific website accepts these inputs: Search in a specific website(search_query: 'string', website: 'string')
Switching to "gpt4all" as the embedder provider stops the requests to ollama and fixes tool usage locally.
Could it be that the embedding mismatches/failures might also explain some broader problems with tool usage?
Anyway, I just thought I'd try connect some of the dots possibly related to this. Thanks for all the amazing effort on this project @joaomdmoura !
Just boosting signal that ollama support would be great!
Hello, I'm curious if anyone has experience using the PdfSearchTool alongside Groq as both the provider and embedder. I'm exploring this combination and would appreciate any insights or tips anyone might have.
@Timilla, Langchain currently does not support groq as an embedding provider. The reason, probably, is that groq does not host embedding models.
in that case what can be an embedder for groq. any ideas?
It's a stopgap, but I've naively updated the chromadb and the embedchain. and Memory seems to work with the ollama provider now, I'm currently taking a look at making the MDXSearchTool work without an OPENAI_API_KEY.
(just bear in mind the base_url for embeddings lacks the /v1 that the other endpoints have.)
My pyproject.toml looks like this
[tool.poetry.dependencies]
python = "^3.12.1,<=3.13"
crewai-tools = { git = "https://github.com/jcoombes/crewai-tools.git", rev = "63d3ae1" }
crewai = { version = "^0.30.11" }
...etc
PR Here. https://github.com/joaomdmoura/crewAI-tools/pull/36
While it's not Ollama, you can run a local embedder by using the HugginFace provider. Here is an example.
test_crew = Crew( agents=[reader, writer], tasks=[read_book, write_report], process=Process.sequential, cache=True, verbose=2, memory=True, embedder={ "provider": "huggingface", "config": { "model": "mixedbread-ai/mxbai-embed-large-v1", # https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1 } } )
Thanks! This helps !!!
File ~\AppData\Roaming\Python\Python311\site-packages\sentence_transformers\SentenceTransformer.py:1296, in SentenceTransformer._load_sbert_model(self, model_name_or_path, token, cache_folder, revision, trust_remote_code)
1294 else:
...
241 Dict[str, int]
: The added tokens.
242 """
--> 243 return self._tokenizer.get_added_tokens_decoder()
AttributeError: 'tokenizers.Tokenizer' object has no attribute 'get_added_tokens_decoder'
I am getting this error if I run a local embedder(mixedbread-ai/mxbai-embed-large-v1) by using the HugginFace provider. Could someone please help me.
I still get this error : ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. crewai-tools 0.2.3 requires chromadb<0.5.0,>=0.4.22, but you have chromadb 0.5.0 which is incompatible. Has it been resolved ?
Hi, I am co-founder and cto of Embedchain here. We have fixed the issue on our side and the ollama embedder should work now. Please use embedchain>=0.1.107
and it should fix the issue.
Here is a test script that worked for me:
from crewai_tools import PDFSearchTool
import embedchain
print("embedchain version:", embedchain.__version__)
tool = PDFSearchTool(
config=dict(
llm=dict(
provider="ollama",
config=dict(
model="gemma",
),
),
embedder=dict(
provider="ollama",
config=dict(
model="nomic-embed-text",
),
),
)
)
print("tool config:", tool.config)
@joaomdmoura please feel free to test and close the issue accordingly.
embedder={ "provider": "huggingface", "config": { "model": "mixedbread-ai/mxbai-embed-large-v1", # https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1 } }
i got this error TypeError: Pooling.init() got an unexpected keyword argument 'include_prompt' Does anyone know what is causing this?
Traceback (most recent call last):
File "/home/bil/ollacrew/insta.py", line 145, in
Crew can g
If we are able to get Ollama supported in the embeddings provider, it might also help solve some general failures with local LLM tool usage and memory.
There might be a need for clearer exception messages when embedding-requests fail while using tools. I noticed this when using the WebsiteSearchTool via openhermes; it requests embeddings from the ollama server using the unsupported openAI endpoint at
/v1/embeddings/
. The ollama server returns a 404 and I guess an error is shown, but it I thought it was related to the website, not the request for embeddings :)The action then reliably results in a loop of errors:
I encountered an error while trying to use the tool. This was the error: 404 page not found. Tool Search in a specific website accepts these inputs: Search in a specific website(search_query: 'string', website: 'string')
Switching to "gpt4all" as the embedder provider stops the requests to ollama and fixes tool usage locally.
Could it be that the embedding mismatches/failures might also explain some broader problems with tool usage?
Anyway, I just thought I'd try connect some of the dots possibly related to this. Thanks for all the amazing effort on this project @joaomdmoura !
I got the same error. Did you fixed?
@fubz Good idea. Would this route still be private, meaning no one else could access the data? Would I need to clone my own instance under my own HugFace account?
This will download the model from HuggingFace, so huggingface will have the metadata that you downloaded the model; however, the embedding of your data will occur locally and stay on your machine.