Inquiry Regarding Integration with Local Models and Process Customisation
Dear CrewAI Maintainers,
I hope this message finds you well. I am reaching out to discuss a couple of aspects of CrewAI that I believe could significantly enhance its utility for developers with specific local model requirements and those seeking greater flexibility in process customisation.
Firstly, I would like to commend you on the inclusion of local model support through tools such as Ollama. This feature is particularly beneficial for tasks that demand specialised knowledge or heightened data privacy. However, I am curious about the extent of this integration. Could you provide further details on how CrewAI handles local models in terms of performance and scalability? Are there any benchmarks or case studies available that demonstrate the efficacy of CrewAI when operating with local models as opposed to cloud-based alternatives?
Secondly, the current implementation of processes in CrewAI, as I understand, is limited to a sequential execution model. While this suffices for a range of applications, there are scenarios where a more complex process flow is necessary. For instance, parallel processing or conditional branching based on intermediate results could be invaluable for certain use cases. Is there a roadmap for introducing more sophisticated process structures? If so, could you shed some light on the anticipated timeline for these features?
I believe addressing these points could greatly broaden the appeal of CrewAI, making it a more versatile tool for the developer community. I look forward to your response and am excited about the potential advancements in this area.
Best regards, yihong1120
For local models it's best to set up a ModelFile (see ollama docs) as that allows you to change parameters etc. Benchmarking on your machine is simple. Just set up a bunch of local models and run them one after the other and see how they perform. i.e.
from langchain.llms import Ollama
llm_general = Ollama(model="nous-hermes2:latest") # best general model
llm_coder = Ollama(model="magicoder:latest")
llm_friend = Ollama(model="samantha-mistral:latest") # Verbose and chatty
llm_brief = Ollama(model="llama2:latest") # OK general model, not so verbose
llm_chat = Ollama(model="openchat:latest") # best chat model
llm_doctor = Ollama(model="meditron:latest") # too slow
llm_uncensored = Ollama(model="llama2-uncensored:70b") # slow
for llm in llm_coder, llm_chat, llm_friend, llm_general, llm_brief, llm_uncensored:
prompt = "What is the capital of France?"
print(llm.model, "\n", llm(prompt))
from langchain.llms import Ollama
llm_general = Ollama(model="nous-hermes2:latest") # best general model
llm_coder = Ollama(model="magicoder:latest")
llm_friend = Ollama(model="samantha-mistral:latest") # Verbose and chatty
llm_brief = Ollama(model="llama2:latest") # OK general model, not so verbose
llm_chat = Ollama(model="openchat:latest") # best chat model
llm_doctor = Ollama(model="meditron:latest") # too slow
llm_uncensored = Ollama(model="llama2-uncensored:70b") # slow
for llm in llm_coder, llm_chat, llm_friend, llm_general, llm_brief, llm_uncensored:
prompt = "What is the capital of France?"
print(llm.model, "\n", llm(prompt))
In the docs you see there are plans to introduce parallel/concurrent processing but no roadmap. Be aware that if you use many local llm's then ollama is sequential in any case and will introduce wait's.
Please spend time on the docs and play around, there's a lot to easily see and learn quickly