langsmith-sdk
langsmith-sdk copied to clipboard
Issue: Understanding inconsistencies in data recording for LangSmith and LangChain JS/TS
Issue you'd like to raise.
Hello, I am using LangSmith to evaluate platform tools for cost and performance tracking and I am noticing some inconsistencies I do not understand.
For azure-openai and openai JS/TS packages exported from LangChain, I get inconsistent cost tracking. When I use the azure-openai package, costs are not recorded. Token counts seem to be consistent, but I get no cost information at all. I have seen similar unrecorded costs making AWS Bedrock calls. When I use the openai package, I get costs, but I'm not confident they are accurate.
No LLM calls record time to first token either, unless it is a local ChatOllama call. LangSmith reports the other values as "this run did not stream output", which is interesting as I am using LangChain's Ollama exported chat model object.
It doesn't matter if I try this in a RunnableSequence or direct invoke call - Azure calls do not get cost information.
When I inspect traces for these Azure runs, I see the following message when trying to interact with the Playground:
And OpenAI runs are correctly derived as OpenAI for the purposes of the Playground.
What am I doing wrong?
We need to relax the default model-matching rules we have for cost estimation. (I think azure openai returns gpt-35-turbo instead of gpt-3.5-turbo for instance)
You can customize the cost rules yourself if you'd like FYI, in case your pricing doesn't match the default ones
Hi @hinthornw thank you for your reply!
I discovered the list of models as well, and noticed the related regexes, and I've been using the model name specified in my deployment (gpt-35-turbo
), which should match to the second regex on the unpinned model name (without a version, at least that is the OpenAI behavior)
And this is what the deployments in Azure show for their model names:
But it sounds like you are saying adding/cloning the model with a looser regex should address this - because on closer inspection we are using the legacy 0613 version.
Thanks again
Hello @hinthornw, i'm reopening this issue because it doesn't seem as if adding new regexes has helped, at least when using the exported AzureChatOpenAI object from @langchain/azure-openai.
Even for gpt-4, no cost attribute is collected at all. It isn't that they don't match, they simply are not returned at all. I've tried values that regex testers indicate should match, like "gpt-4", "gpt-3.5-turbo", but still they fail to return costs.
const azureChatModel = new AzureChatOpenAI({
azureOpenAIEndpoint: my-cool-endpoint
azureOpenAIApiKey: super-secret
azureOpenAIApiDeploymentName: also-super-secret,
modelName: "gpt-4"
});
For 3.5 turbo, the other params are the same but the modelName argument I am passing, just gpt-3.5-turbo
, doesnt appear to work either, at least not for the AzureChatOpenAI object.
is there something else i should be inspecting?
Thanks for raising - will pass this on.
If you click on the metadata
tab for one of those llm runs, what does it show? Any chance you could share a link to a run to help us debug?
@elliotmrodriguez Hey there, have you perhaps already figured why the stream requests are not tracked properly? I have a related issue: No matter which way I trace my Custom LLM using WrapOpenAI / traceable / RunTree from the Typescript SDK, no first tokens is monitored. Any ideas? :)
Oh the typescript SDK may not track that event rn - I'll sync with the owner
I had the same issue. I managed to track cost by configuring a LangSmith model where the rule matches with ls_model_name
metadata (which was filled with the deployment name) 😉
https://docs.smith.langchain.com/how_to_guides/tracing/log_llm_trace