langchainjs
langchainjs copied to clipboard
LangSmith "Criteria()" and "LabeledCriteria()" do not accept custom criteria in TypeScript, despite examples
Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain.js documentation with the integrated search.
- [X] I used the GitHub search to find a similar question and didn't find it.
- [X] I am sure that this is a bug in LangChain.js rather than my code.
- [X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
Neither of these work. The first is what is implied to work by the comments in https://github.com/langchain-ai/langchainjs/blob/b9d86b16bd0788144f8f27cf5d18960c43da49af/langchain/src/smith/config.ts#L219, and the second is what I would expect to work based on the Python equivalent code.
const evalResult = await runOnDataset(
llmFunction,
datasetName,
{
evaluators: [{
evaluatorType: "criteria",
criteria: { "isCompliant": "Does the submission comply with the requirements of XYZ" }
}],
},
);
const compliantCriteria = Criteria({
evaluatorType: "criteria",
criteria: { "isCompliant": "Does the submission comply with the requirements of XYZ" }
});
const evalResult = await runOnDataset(
llmFunction,
datasetName,
{
evaluators: [compliantCriteria],
},
);
Error Message and Stack Trace (if applicable)
No response
Description
There seems to be no way to use custom criteria in an criteria evaluator, which differs from the Python implementation.
The first method above does not seem to work -- the object is not compatible with what it wants to be in the evaluators array. The second method does not work either -- Criteria()
's argument is a Criteria
type instead of a CriteriaLike
, and if that line is @ts-ignore
'd, it seems to fail with a 422 "expected a string" error when sending "feedback" to the LangSmith server.
Putting this issue in here because the code is in this repo and not in LangSmith
proper (yet?)
System Info
[email protected] Node v21.5.0
It looks like the issue you're encountering with Criteria()
and LabeledCriteria()
in TypeScript stems from the way custom criteria are expected to be defined and used within the LangChain JS library. Unlike the Python implementation, where you might directly pass custom criteria as objects, LangChain JS, particularly with LangSmith, requires a more structured approach.
For custom criteria in TypeScript, you should leverage the CriteriaEvalChain
class for defining and evaluating custom criteria. Here's a streamlined example on how to define and use a single custom criterion:
import { CriteriaEvalChain } from '@langchain/evaluation/criteria';
import { BaseLanguageModelInterface } from '@langchain/core/language_models/base';
const llm: BaseLanguageModelInterface = /* your language model instance */;
const customCriterion = {
creativity: "Does the submission demonstrate novelty or unique ideas?",
};
const criteriaEvalChain = await CriteriaEvalChain.fromLLM(llm, customCriterion);
const input = "Tell me a joke";
const prediction = "I ate some square pie but I don't know the square of pi.";
const res = await criteriaEvalChain.evaluateStrings({ input, prediction });
console.log(res);
This example demonstrates defining a creativity
criterion and evaluating a prediction against it. For multiple criteria, simply expand the customCriterion
object with additional key-value pairs representing each criterion.
Remember, the CriteriaEvalChain
class is key to using custom criteria effectively in TypeScript with LangChain JS. This approach provides a structured and flexible way to evaluate predictions against custom-defined criteria, offering detailed reasoning, values, and scores for each criterion.
If you're still facing issues or have more questions, feel free to ask!
Details
Sources
- langchain/src/smith/config.ts
- examples/src/guides/evaluation/string/custom_criteria.ts
- langchain/src/evaluation/criteria/tests/criteria_eval_chain.int.test.ts
- langchain/src/evaluation/criteria/criteria.ts
- langchain/src/evaluation/types.ts
About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
To continue the conversation, mention @dosu.
@dosu This doesn't apply in this scenario -- I don't want to create an eval chain, I need to create an EvalConfig, since that is what runOnDataset()
accepts as an argument.
To address the issue with using custom criteria in an EvalConfig
for runOnDataset()
in TypeScript with LangChain.js, it's important to ensure that your custom criteria are correctly defined and formatted to be compatible with the expectations of the LangSmith server. Here's a concise guide to help you set up your custom criteria correctly:
-
Define Custom Criteria Using the
Criteria
Function: When creating a custom criteria configuration, use theCriteria
function. Make sure the criteria passed are stringifiable since the LangSmith server expects string values for evaluation. -
Stringify Values: All values passed to the
criteria
property must be stringifiable to avoid the "422 expected a string" error. This includes ensuring objects passed as criteria are correctly converted to strings. -
Format Evaluator Inputs Correctly: Use a custom function to format the evaluation data (input, prediction, reference) so that it's correctly stringified and formatted for the LangSmith server. This step is crucial for ensuring compatibility and avoiding errors.
-
Example Configuration:
const customCriteriaConfig = Criteria("helpfulness", {
formatEvaluatorInputs: ({ rawInput, rawPrediction }) => ({
prediction: getSingleStringifiedValue(rawPrediction),
input: getSingleStringifiedValue(rawInput),
}),
llm: /* your language model instance */,
feedbackKey: "customFeedbackKey" // Optional: specify if needed
});
In this setup, Criteria
is utilized to define a custom criteria evaluator named "helpfulness"
. The formatEvaluatorInputs
function ensures that inputs and predictions are stringified, addressing the server's requirements.
-
Incorporate Custom Criteria into
EvalConfig
:
const evalConfig = {
evaluators: [customCriteriaConfig],
// Include other necessary configuration options here
};
This configuration can then be used directly with runOnDataset()
or any other function in LangChain.js that accepts an EvalConfig
.
By following these steps, you should be able to use custom criteria within your EvalConfig
successfully, ensuring compatibility with the LangSmith server and avoiding the "422 expected a string" error.
Details
Sources
- langchain/src/smith/config.ts
- langchain/src/evaluation/criteria/criteria.ts
- langchain/src/smith/runner_utils.ts
About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
To continue the conversation, mention @dosu.
Sorry for the delay!
@dqbd could you have a look at this one?
Hello @w-biggs!
The issue stems from the invalid type, which will be fixed in #5403. In the meantime, you can either:
- Ignore the error with
@ts-expect-error
directive
{
evaluators: [
// @ts-expect-error Invalid type
Criteria({
isCompliant: "Does the submission comply with the requirements of XYZ",
}),
],
}
- pass
formatEvaluatorInputs
to obtainprediction
,input
,reference