crewAI
crewAI copied to clipboard
Wrong OpenAI tiktoken embeddings
Hi,
While creating an Agent
, I'm passing a custom defined LLM in the llm
argument. It's an object of type langchain_openai.ChatOpenAI
Agent(
role="...",
goal="...",
backstory="...",
tools=...,
allow_delegation=False,
verbose=True,
llm = ChatOpenAI(model='gpt-4o', temperature=0.7, tiktoken_model_name='gpt-4-1106-preview')
)
In ChatOpenAI
there is an argument tiktoken_model_name
(GitHub link) which if not specified, is taken same as model
but we can have them different (and I want that).
Now while running the crew I get a strange warning:
WARNING - Error in TokenCalcHandler.on_llm_start callback: KeyError('Could not automatically map gpt-4o to a tokeniser. Please use `tiktoken.get_encoding` to explicitly get the tokeniser you expect.')
I tried to debug this and found out that in CrewAI's library code (src/crewai/agent.py Line 171) you're using a fixed argument "model_name"
for token counter. It should have been tiktoken_model_name
at first, and if it is not given then you should go to model_name
.
Although it is not breaking anything as of now because you've made a fallback at (src/crewai/utilities/token_counter_callback.py Line 47) by hard coding "cl100k_base"
(i.e., my crew uses text-embedding-ada-002-v2
), but I suggest you to make above mentioned changes for better integration with LangChain's tiktoken_model_name
argument.
Please correct me if I'm wrong. Thanks!