LangFair Integration

Open mohitcek opened this issue 8 months ago • 0 comments

LangFair is a Python toolkit designed for conducting bias and fairness assessments of large language model (LLM) use cases. It supports various metrics to conduct counterfactual and toxicity assessment on the LLM responses.

This issue focus on integrating LangFair to conduct above-mentioned assessment with the weave.

Approach: Create two scorer class, one for counterfactual and another for toxicity assessment.

Counterfactual Scorer: Input an LLM propmts to this class, & this class will identify protected words (gender or race related) in the prompts/questions, create counterfactual prompts, generate counterfactual responses, and compute metric values supported by LangFair ('Cosine', 'RougeL', 'Bleu', 'Sentiment Bias').
Toxicity Scorer: This class gives an measure of toxicity present in the LLM response using a classifier supported by LangFair.

Apr 03 '25 15:04 mohitcek