crewAI ollama not in embedder provider list

I am trying to write a simple pdf agent which would answer questions on the basis of pdf knowledge

app.py

llm = Ollama(base_url = url,model=model,num_gpu=2)
rag_tool =  PDFSearchTool(
    pdf = r'pdf_path',
    config=dict(
        llm=dict(
            provider="ollama", # or google, openai, anthropic, llama2, ...
            config=dict(
                model="gemma",
                # temperature=0.5,
                # top_p=1,
                # stream=true,
            ),
        ),
        embedder=dict(
            provider="ollama",
            config=dict(
                model="nomic-embed-text",
                task_type="retrieval_document",
                # title="Embeddings",
            ),
        ),
    )
)

auditor_agent = Agent(
    role = 'Data Analyst',
    goal = 'You perfectly know how to analyze any data using provided txt file and searching info via RAG tool',
    Background = 'You are data expert',
    verbose = True,
    allow_delegation = False,
    tools = [rag_tool]
)


task = Task(
    description = "what is the latest status of ₹2000 bank notes",
    tools = [rag_tool],
    agent = auditor_agent,
    expected_output = '''The output should be in following format :
    Format:
    Word Limit : 25
    Writing style :  simple and logical
    '''
)
task1 = task.execute()
print(task1.output())

error:

schema.SchemaError: Key 'embedder' error:
Key 'provider' error:
Or('openai', 'gpt4all', 'huggingface', 'vertexai', 'azure_openai', 'google', 'mistralai', 'nvidia') did not validate 'ollama'
'openai' does not match 'ollama'
'gpt4all' does not match 'ollama'
'huggingface' does not match 'ollama'
'vertexai' does not match 'ollama'
'azure_openai' does not match 'ollama'
'google' does not match 'ollama'
'mistralai' does not match 'ollama'
'nvidia' does not match 'ollama'

Apr 07 '24 09:04 punitchauhan771

While it's not Ollama, you can run a local embedder by using the HugginFace provider. Here is an example.

        test_crew = Crew(
            agents=[reader, writer],
            tasks=[read_book, write_report],
            process=Process.sequential,
            cache=True,
            verbose=2,
            memory=True,
            embedder={
                "provider": "huggingface",
                "config": {
                    "model": "mixedbread-ai/mxbai-embed-large-v1", # https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1
                }
            }
        )

Apr 10 '24 01:04 fubz

@fubz Good idea. Would this route still be private, meaning no one else could access the data? Would I need to clone my own instance under my own HugFace account?

Apr 10 '24 17:04 piovis2023

Ollama can be used for PdfSearchTool with

PDFSearchTool = PDFSearchTool(pdf=pdf_file_path,
	config=dict(
        llm=dict(
            provider="ollama", # or google, openai, anthropic, llama2, ...
            	config=dict(
                	model="llama3:8b-instruct-q6_K",
                	base_url="http://ollama_server_ip:11434",
            	),
        ),
        embedder=dict(
            provider="ollama",
            	config=dict(
                	model="mxbai-embed-large:latest",
                	base_url="http://192.168.42.173:11434",
            ),
        ),
    )
)

and pip install -U embedchain==0.1.103 But when you install this new version of embedchain an error occured

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
crewai-tools 0.2.3 requires chromadb<0.5.0,>=0.4.22, but you have chromadb 0.5.0 which is incompatible.

@joaomdmoura may be a little update in crewai-tools to enable chromadb 0.5.0 ? :-)

May 03 '24 11:05 Guerdal

Hello, I'm curious if anyone has experience using the PdfSearchTool alongside Groq as both the provider and embedder. I'm exploring this combination and would appreciate any insights or tips anyone might have.

May 03 '24 14:05 Timilla

I am trying to write a simple pdf agent which would answer questions on the basis of pdf knowledge

app.py

llm = Ollama(base_url = url,model=model,num_gpu=2)
rag_tool =  PDFSearchTool(
    pdf = r'pdf_path',
    config=dict(
        llm=dict(
            provider="ollama", # or google, openai, anthropic, llama2, ...
            config=dict(
                model="gemma",
                # temperature=0.5,
                # top_p=1,
                # stream=true,
            ),
        ),
        embedder=dict(
            provider="ollama",
            config=dict(
                model="nomic-embed-text",
                task_type="retrieval_document",
                # title="Embeddings",
            ),
        ),
    )
)

auditor_agent = Agent(
    role = 'Data Analyst',
    goal = 'You perfectly know how to analyze any data using provided txt file and searching info via RAG tool',
    Background = 'You are data expert',
    verbose = True,
    allow_delegation = False,
    tools = [rag_tool]
)


task = Task(
    description = "what is the latest status of ₹2000 bank notes",
    tools = [rag_tool],
    agent = auditor_agent,
    expected_output = '''The output should be in following format :
    Format:
    Word Limit : 25
    Writing style :  simple and logical
    '''
)
task1 = task.execute()
print(task1.output())

error:

schema.SchemaError: Key 'embedder' error:
Key 'provider' error:
Or('openai', 'gpt4all', 'huggingface', 'vertexai', 'azure_openai', 'google', 'mistralai', 'nvidia') did not validate 'ollama'
'openai' does not match 'ollama'
'gpt4all' does not match 'ollama'
'huggingface' does not match 'ollama'
'vertexai' does not match 'ollama'
'azure_openai' does not match 'ollama'
'google' does not match 'ollama'
'mistralai' does not match 'ollama'
'nvidia' does not match 'ollama'

@punitchauhan771, Langchain currently does not support ollama as an embedding provider. The reason, probably, is that ollama currenlty does not have an openai compatble (/v1) embedding endoint.

May 03 '24 15:05 yuriwa

Hello, I'm curious if anyone has experience using the PdfSearchTool alongside Groq as both the provider and embedder. I'm exploring this combination and would appreciate any insights or tips anyone might have.

@Timilla, Langchain currently does not support groq as an embedding provider. The reason, probably, is that groq does not host embedding models.

May 03 '24 16:05 yuriwa

@punitchauhan771, Langchain currently does not support ollama as an embedding provider. The reason, probably, is that ollama currenlty does not have an openai compatble (/v1) embedding endoint.

Hi, I think langchain supports ollama as an embeddings provider

https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.ollama.OllamaEmbeddings.html

Also I think this is issue is likely due to embedchain module used in crew ai for embeddings, in previous versions it didn't had ollama as a provider.

May 03 '24 16:05 punitchauhan771

As I write earlier the last version of embedchain (0.1.103) is compatible with ollama and need upgrade chromadb to 0.5.0. But crewai-tools need chromadb < 0.5.0 so we must wait @joaomdmoura to upgrade the requirements of crewai-tools :-)

May 03 '24 16:05 Guerdal

Already updated on the new RC 0.30.0rc5 will probably push it live over the weekend / monday

May 03 '24 16:05 joaomdmoura

I must have missed something :-) @joaomdmoura

pip install -U crewai[tools]==0.30.0rc5
.....
Requirement already satisfied: pycparser in ./.local/lib/python3.10/site-packages (from cffi>=1.4.1->pynacl>=1.4.0->PyGithub<2.0.0,>=1.59.1->embedchain<0.2.0,>=0.1.98->crewai[tools]==0.30.0rc5) (2.21)
Installing collected packages: crewai, crewai-tools
  Attempting uninstall: crewai
    Found existing installation: crewai 0.28.8
    Uninstalling crewai-0.28.8:
      Successfully uninstalled crewai-0.28.8
  Attempting uninstall: crewai-tools
    Found existing installation: crewai-tools 0.1.7
    Uninstalling crewai-tools-0.1.7:
      Successfully uninstalled crewai-tools-0.1.7
Successfully installed **crewai-0.30.0rc5 crewai-tools-0.2.3**

After => pip install -U embedchain==0.1.103 (to have ollama in embedings)

....
Installing collected packages: pypdf, chromadb, embedchain
  Attempting uninstall: pypdf
    Found existing installation: pypdf 3.17.4
    Uninstalling pypdf-3.17.4:
      Successfully uninstalled pypdf-3.17.4
  Attempting uninstall: chromadb
    Found existing installation: chromadb 0.4.23
    Uninstalling chromadb-0.4.23:
      Successfully uninstalled chromadb-0.4.23
  Attempting uninstall: embedchain
    Found existing installation: embedchain 0.1.102
    Uninstalling embedchain-0.1.102:
      Successfully uninstalled embedchain-0.1.102
**ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
crewai-tools 0.2.3 requires chromadb<0.5.0,>=0.4.22, but you have chromadb 0.5.0 which is incompatible.**
Successfully installed chromadb-0.5.0 embedchain-0.1.103 pypdf-4.2.0

May 03 '24 18:05 Guerdal

Same here: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. crewai-tools 0.2.3 requires chromadb<0.5.0,>=0.4.22, but you have chromadb 0.5.0 which is incompatible.

May 03 '24 18:05 yuriwa

oh ! sorry, looking into that

May 03 '24 19:05 joaomdmoura

If we are able to get Ollama supported in the embeddings provider, it might also help solve some general failures with local LLM tool usage and memory.

There might be a need for clearer exception messages when embedding-requests fail while using tools. I noticed this when using the WebsiteSearchTool via openhermes; it requests embeddings from the ollama server using the unsupported openAI endpoint at /v1/embeddings/. The ollama server returns a 404 and I guess an error is shown, but it I thought it was related to the website, not the request for embeddings :)

The action then reliably results in a loop of errors:

I encountered an error while trying to use the tool. This was the error: 404 page not found.
 Tool Search in a specific website accepts these inputs: Search in a specific website(search_query: 'string', website: 'string')

Switching to "gpt4all" as the embedder provider stops the requests to ollama and fixes tool usage locally.

Could it be that the embedding mismatches/failures might also explain some broader problems with tool usage?

Anyway, I just thought I'd try connect some of the dots possibly related to this. Thanks for all the amazing effort on this project @joaomdmoura !

May 03 '24 22:05 puffo

Just boosting signal that ollama support would be great!

May 10 '24 20:05 swayson

Hello, I'm curious if anyone has experience using the PdfSearchTool alongside Groq as both the provider and embedder. I'm exploring this combination and would appreciate any insights or tips anyone might have.

@Timilla, Langchain currently does not support groq as an embedding provider. The reason, probably, is that groq does not host embedding models.

in that case what can be an embedder for groq. any ideas?

May 16 '24 19:05 SumaiyaSultan2002

It's a stopgap, but I've naively updated the chromadb and the embedchain. and Memory seems to work with the ollama provider now, I'm currently taking a look at making the MDXSearchTool work without an OPENAI_API_KEY.

(just bear in mind the base_url for embeddings lacks the /v1 that the other endpoints have.)

My pyproject.toml looks like this

[tool.poetry.dependencies]
python = "^3.12.1,<=3.13"
crewai-tools = { git = "https://github.com/jcoombes/crewai-tools.git", rev = "63d3ae1" }
crewai = { version = "^0.30.11" }
...etc

PR Here. https://github.com/joaomdmoura/crewAI-tools/pull/36

May 19 '24 18:05 jcoombes

While it's not Ollama, you can run a local embedder by using the HugginFace provider. Here is an example.

        test_crew = Crew(
            agents=[reader, writer],
            tasks=[read_book, write_report],
            process=Process.sequential,
            cache=True,
            verbose=2,
            memory=True,
            embedder={
                "provider": "huggingface",
                "config": {
                    "model": "mixedbread-ai/mxbai-embed-large-v1", # https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1
                }
            }
        )

Thanks! This helps !!!

May 23 '24 14:05 Orwlit

File ~\AppData\Roaming\Python\Python311\site-packages\sentence_transformers\SentenceTransformer.py:1296, in SentenceTransformer._load_sbert_model(self, model_name_or_path, token, cache_folder, revision, trust_remote_code) 1294 else: ... 241 Dict[str, int]: The added tokens. 242 """ --> 243 return self._tokenizer.get_added_tokens_decoder()

AttributeError: 'tokenizers.Tokenizer' object has no attribute 'get_added_tokens_decoder'

I am getting this error if I run a local embedder(mixedbread-ai/mxbai-embed-large-v1) by using the HugginFace provider. Could someone please help me.

May 27 '24 08:05 shivpatil1901

I still get this error : ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. crewai-tools 0.2.3 requires chromadb<0.5.0,>=0.4.22, but you have chromadb 0.5.0 which is incompatible. Has it been resolved ?

Jun 05 '24 20:06 cblaison

Hi, I am co-founder and cto of Embedchain here. We have fixed the issue on our side and the ollama embedder should work now. Please use embedchain>=0.1.107 and it should fix the issue.

Here is a test script that worked for me:

from crewai_tools import PDFSearchTool
import embedchain


print("embedchain version:", embedchain.__version__)

tool = PDFSearchTool(
    config=dict(
        llm=dict(
            provider="ollama",
            config=dict(
                model="gemma",
            ),
        ),
        embedder=dict(
            provider="ollama",
            config=dict(
                model="nomic-embed-text",
            ),
        ),
    )
)

print("tool config:", tool.config)

@joaomdmoura please feel free to test and close the issue accordingly.

Jun 08 '24 17:06 deshraj

 embedder={
                "provider": "huggingface",
                "config": {
                    "model": "mixedbread-ai/mxbai-embed-large-v1", # https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1
                }
            }

i got this error TypeError: Pooling.init() got an unexpected keyword argument 'include_prompt' Does anyone know what is causing this?

Traceback (most recent call last): File "/home/bil/ollacrew/insta.py", line 145, in crew = Crew( File "/home/bil/.local/lib/python3.10/site-packages/pydantic/main.py", line 171, in init self.pydantic_validator.validate_python(data, self_instance=self) File "/home/bil/.local/lib/python3.10/site-packages/crewai/crew.py", line 167, in create_crew_memory self._short_term_memory = ShortTermMemory(crew=self, embedder_config=self.embedder) File "/home/bil/.local/lib/python3.10/site-packages/crewai/memory/short_term/short_term_memory.py", line 16, in init storage = RAGStorage(type="short_term", embedder_config=embedder_config, crew=crew) File "/home/bil/.local/lib/python3.10/site-packages/crewai/memory/storage/rag_storage.py", line 75, in init self.app = App.from_config(config=config) File "/home/bil/.local/lib/python3.10/site-packages/embedchain/app.py", line 388, in from_config embedding_model = EmbedderFactory.create( File "/home/bil/.local/lib/python3.10/site-packages/embedchain/factory.py", line 79, in create return embedder_class(config=embedder_config_class(**config_data)) File "/home/bil/.local/lib/python3.10/site-packages/embedchain/embedder/huggingface.py", line 14, in init embeddings = HuggingFaceEmbeddings(model_name=self.config.model) File "/home/bil/.local/lib/python3.10/site-packages/langchain_community/embeddings/huggingface.py", line 72, in init self.client = sentence_transformers.SentenceTransformer( File "/home/bil/.local/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 194, in init modules = self._load_sbert_model( File "/home/bil/.local/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 1073, in _load_sbert_model module = module_class.load(module_path) File "/home/bil/.local/lib/python3.10/site-packages/sentence_transformers/models/Pooling.py", line 198, in load return Pooling(**config)

Jun 12 '24 16:06 leobilocastro

Crew can g

If we are able to get Ollama supported in the embeddings provider, it might also help solve some general failures with local LLM tool usage and memory.

There might be a need for clearer exception messages when embedding-requests fail while using tools. I noticed this when using the WebsiteSearchTool via openhermes; it requests embeddings from the ollama server using the unsupported openAI endpoint at /v1/embeddings/. The ollama server returns a 404 and I guess an error is shown, but it I thought it was related to the website, not the request for embeddings :)

The action then reliably results in a loop of errors:
I encountered an error while trying to use the tool. This was the error: 404 page not found.
 Tool Search in a specific website accepts these inputs: Search in a specific website(search_query: 'string', website: 'string')
Switching to "gpt4all" as the embedder provider stops the requests to ollama and fixes tool usage locally.

Could it be that the embedding mismatches/failures might also explain some broader problems with tool usage?

Anyway, I just thought I'd try connect some of the dots possibly related to this. Thanks for all the amazing effort on this project @joaomdmoura !

I got the same error. Did you fixed?

Jun 13 '24 15:06 crisschan

@fubz Good idea. Would this route still be private, meaning no one else could access the data? Would I need to clone my own instance under my own HugFace account?

This will download the model from HuggingFace, so huggingface will have the metadata that you downloaded the model; however, the embedding of your data will occur locally and stay on your machine.

Jul 24 '24 17:07 fubz