langsmith-sdk icon indicating copy to clipboard operation
langsmith-sdk copied to clipboard

Issue: Understanding inconsistencies in data recording for LangSmith and LangChain JS/TS

Open elliotmrodriguez opened this issue 11 months ago • 7 comments

Issue you'd like to raise.

Hello, I am using LangSmith to evaluate platform tools for cost and performance tracking and I am noticing some inconsistencies I do not understand.

For azure-openai and openai JS/TS packages exported from LangChain, I get inconsistent cost tracking. When I use the azure-openai package, costs are not recorded. Token counts seem to be consistent, but I get no cost information at all. I have seen similar unrecorded costs making AWS Bedrock calls. When I use the openai package, I get costs, but I'm not confident they are accurate.

image

No LLM calls record time to first token either, unless it is a local ChatOllama call. LangSmith reports the other values as "this run did not stream output", which is interesting as I am using LangChain's Ollama exported chat model object.

It doesn't matter if I try this in a RunnableSequence or direct invoke call - Azure calls do not get cost information.

When I inspect traces for these Azure runs, I see the following message when trying to interact with the Playground: image

And OpenAI runs are correctly derived as OpenAI for the purposes of the Playground.

What am I doing wrong?

elliotmrodriguez avatar Mar 12 '24 18:03 elliotmrodriguez

We need to relax the default model-matching rules we have for cost estimation. (I think azure openai returns gpt-35-turbo instead of gpt-3.5-turbo for instance)

You can customize the cost rules yourself if you'd like FYI, in case your pricing doesn't match the default ones

Export-1710273338544

hinthornw avatar Mar 12 '24 19:03 hinthornw

Hi @hinthornw thank you for your reply!

I discovered the list of models as well, and noticed the related regexes, and I've been using the model name specified in my deployment (gpt-35-turbo), which should match to the second regex on the unpinned model name (without a version, at least that is the OpenAI behavior)

image

And this is what the deployments in Azure show for their model names:

image

But it sounds like you are saying adding/cloning the model with a looser regex should address this - because on closer inspection we are using the legacy 0613 version.

Thanks again

elliotmrodriguez avatar Mar 13 '24 11:03 elliotmrodriguez

Hello @hinthornw, i'm reopening this issue because it doesn't seem as if adding new regexes has helped, at least when using the exported AzureChatOpenAI object from @langchain/azure-openai.

Even for gpt-4, no cost attribute is collected at all. It isn't that they don't match, they simply are not returned at all. I've tried values that regex testers indicate should match, like "gpt-4", "gpt-3.5-turbo", but still they fail to return costs.

const azureChatModel = new AzureChatOpenAI({
    azureOpenAIEndpoint: my-cool-endpoint
    azureOpenAIApiKey: super-secret
    azureOpenAIApiDeploymentName: also-super-secret,
    modelName: "gpt-4" 
  });

For 3.5 turbo, the other params are the same but the modelName argument I am passing, just gpt-3.5-turbo, doesnt appear to work either, at least not for the AzureChatOpenAI object.

is there something else i should be inspecting?

elliotmrodriguez avatar Mar 13 '24 19:03 elliotmrodriguez

Thanks for raising - will pass this on.

If you click on the metadata tab for one of those llm runs, what does it show? Any chance you could share a link to a run to help us debug?

hinthornw avatar Apr 09 '24 02:04 hinthornw

@elliotmrodriguez Hey there, have you perhaps already figured why the stream requests are not tracked properly? I have a related issue: No matter which way I trace my Custom LLM using WrapOpenAI / traceable / RunTree from the Typescript SDK, no first tokens is monitored. Any ideas? :)

Kniggishood avatar May 29 '24 19:05 Kniggishood

Oh the typescript SDK may not track that event rn - I'll sync with the owner

hinthornw avatar Jun 01 '24 14:06 hinthornw

I had the same issue. I managed to track cost by configuring a LangSmith model where the rule matches with ls_model_name metadata (which was filled with the deployment name) 😉

MathiasVantieghem avatar Jul 03 '24 21:07 MathiasVantieghem

https://docs.smith.langchain.com/how_to_guides/tracing/log_llm_trace

hinthornw avatar Sep 06 '24 23:09 hinthornw