crewAI icon indicating copy to clipboard operation
crewAI copied to clipboard

Wrong OpenAI tiktoken embeddings

Open shivam-aggarwal1 opened this issue 2 months ago • 0 comments

Hi,

While creating an Agent, I'm passing a custom defined LLM in the llm argument. It's an object of type langchain_openai.ChatOpenAI

Agent(
    role="...",
    goal="...",
    backstory="...",
    tools=...,
    allow_delegation=False,
    verbose=True,
    llm = ChatOpenAI(model='gpt-4o', temperature=0.7, tiktoken_model_name='gpt-4-1106-preview')
)

In ChatOpenAI there is an argument tiktoken_model_name (GitHub link) which if not specified, is taken same as model but we can have them different (and I want that).

Now while running the crew I get a strange warning:

WARNING - Error in TokenCalcHandler.on_llm_start callback: KeyError('Could not automatically map gpt-4o to a tokeniser. Please use `tiktoken.get_encoding` to explicitly get the tokeniser you expect.')

I tried to debug this and found out that in CrewAI's library code (src/crewai/agent.py Line 171) you're using a fixed argument "model_name" for token counter. It should have been tiktoken_model_name at first, and if it is not given then you should go to model_name.

Although it is not breaking anything as of now because you've made a fallback at (src/crewai/utilities/token_counter_callback.py Line 47) by hard coding "cl100k_base" (i.e., my crew uses text-embedding-ada-002-v2), but I suggest you to make above mentioned changes for better integration with LangChain's tiktoken_model_name argument.

Please correct me if I'm wrong. Thanks!

shivam-aggarwal1 avatar May 20 '24 13:05 shivam-aggarwal1