langchainjs LangSmith "Criteria()" and "LabeledCriteria()" do not accept custom criteria in TypeScript, despite examples

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain.js documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain.js rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

Neither of these work. The first is what is implied to work by the comments in https://github.com/langchain-ai/langchainjs/blob/b9d86b16bd0788144f8f27cf5d18960c43da49af/langchain/src/smith/config.ts#L219, and the second is what I would expect to work based on the Python equivalent code.

const evalResult = await runOnDataset(
  llmFunction,
  datasetName,
  {
    evaluators: [{
      evaluatorType: "criteria",
      criteria: { "isCompliant": "Does the submission comply with the requirements of XYZ" }
    }],
  },
);

const compliantCriteria = Criteria({
  evaluatorType: "criteria",
  criteria: { "isCompliant": "Does the submission comply with the requirements of XYZ" }
});

const evalResult = await runOnDataset(
  llmFunction,
  datasetName,
  {
    evaluators: [compliantCriteria],
  },
);

Error Message and Stack Trace (if applicable)

No response

Description

There seems to be no way to use custom criteria in an criteria evaluator, which differs from the Python implementation.

The first method above does not seem to work -- the object is not compatible with what it wants to be in the evaluators array. The second method does not work either -- Criteria()'s argument is a Criteria type instead of a CriteriaLike, and if that line is @ts-ignore'd, it seems to fail with a 422 "expected a string" error when sending "feedback" to the LangSmith server.

Putting this issue in here because the code is in this repo and not in LangSmith proper (yet?)

System Info

[email protected] Node v21.5.0

Apr 23 '24 19:04 w-biggs

It looks like the issue you're encountering with Criteria() and LabeledCriteria() in TypeScript stems from the way custom criteria are expected to be defined and used within the LangChain JS library. Unlike the Python implementation, where you might directly pass custom criteria as objects, LangChain JS, particularly with LangSmith, requires a more structured approach.

For custom criteria in TypeScript, you should leverage the CriteriaEvalChain class for defining and evaluating custom criteria. Here's a streamlined example on how to define and use a single custom criterion:

import { CriteriaEvalChain } from '@langchain/evaluation/criteria';
import { BaseLanguageModelInterface } from '@langchain/core/language_models/base';

const llm: BaseLanguageModelInterface = /* your language model instance */;

const customCriterion = {
  creativity: "Does the submission demonstrate novelty or unique ideas?",
};

const criteriaEvalChain = await CriteriaEvalChain.fromLLM(llm, customCriterion);

const input = "Tell me a joke";
const prediction = "I ate some square pie but I don't know the square of pi.";

const res = await criteriaEvalChain.evaluateStrings({ input, prediction });

console.log(res);

This example demonstrates defining a creativity criterion and evaluating a prediction against it. For multiple criteria, simply expand the customCriterion object with additional key-value pairs representing each criterion.

Remember, the CriteriaEvalChain class is key to using custom criteria effectively in TypeScript with LangChain JS. This approach provides a structured and flexible way to evaluate predictions against custom-defined criteria, offering detailed reasoning, values, and scores for each criterion.

If you're still facing issues or have more questions, feel free to ask!

Details

Sources

About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Apr 23 '24 19:04 dosubot[bot]

@dosu This doesn't apply in this scenario -- I don't want to create an eval chain, I need to create an EvalConfig, since that is what runOnDataset() accepts as an argument.

Apr 24 '24 15:04 w-biggs

To address the issue with using custom criteria in an EvalConfig for runOnDataset() in TypeScript with LangChain.js, it's important to ensure that your custom criteria are correctly defined and formatted to be compatible with the expectations of the LangSmith server. Here's a concise guide to help you set up your custom criteria correctly:

Define Custom Criteria Using the Criteria Function: When creating a custom criteria configuration, use the Criteria function. Make sure the criteria passed are stringifiable since the LangSmith server expects string values for evaluation.
Stringify Values: All values passed to the criteria property must be stringifiable to avoid the "422 expected a string" error. This includes ensuring objects passed as criteria are correctly converted to strings.
Format Evaluator Inputs Correctly: Use a custom function to format the evaluation data (input, prediction, reference) so that it's correctly stringified and formatted for the LangSmith server. This step is crucial for ensuring compatibility and avoiding errors.
Example Configuration:

const customCriteriaConfig = Criteria("helpfulness", {
  formatEvaluatorInputs: ({ rawInput, rawPrediction }) => ({
    prediction: getSingleStringifiedValue(rawPrediction),
    input: getSingleStringifiedValue(rawInput),
  }),
  llm: /* your language model instance */,
  feedbackKey: "customFeedbackKey" // Optional: specify if needed
});

In this setup, Criteria is utilized to define a custom criteria evaluator named "helpfulness". The formatEvaluatorInputs function ensures that inputs and predictions are stringified, addressing the server's requirements.

Incorporate Custom Criteria into EvalConfig:

const evalConfig = {
  evaluators: [customCriteriaConfig],
  // Include other necessary configuration options here
};

This configuration can then be used directly with runOnDataset() or any other function in LangChain.js that accepts an EvalConfig.

By following these steps, you should be able to use custom criteria within your EvalConfig successfully, ensuring compatibility with the LangSmith server and avoiding the "422 expected a string" error.

Details

Sources

About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Apr 24 '24 15:04 dosubot[bot]

Sorry for the delay!

@dqbd could you have a look at this one?

May 15 '24 22:05 jacoblee93

Hello @w-biggs!

The issue stems from the invalid type, which will be fixed in #5403. In the meantime, you can either:

Ignore the error with @ts-expect-error directive

{
  evaluators: [
    // @ts-expect-error Invalid type
    Criteria({
      isCompliant: "Does the submission comply with the requirements of XYZ",
    }),
  ],
}

pass formatEvaluatorInputs to obtain prediction, input, reference

May 15 '24 23:05 dqbd

langchainjs langchainjs copied to clipboard

LangSmith "Criteria()" and "LabeledCriteria()" do not accept custom criteria in TypeScript, despite examples

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

Details

Details

langchainjs
langchainjs copied to clipboard