How to go about explaining or interpreting text for chat completion models?

Open karrtikiyer-tw opened this issue 1 year ago • 1 comments

Hi community, Any thoughts on how we can get some of the lime tool explainers to work on the Open AI Chat Completion models? Any advise or help appreciated. Thanks, Karrtik

Feb 27 '24 10:02 karrtikiyer-tw

@karrtikiyer-tw Interesting topic, here are some of the introductory steps that will help you to get an idea regarding the above,

install the required libraries (lime)
define a wrapper function:- As LIME requires a prediction function that returns a probability distribution over classes and Since Chat Completion models provide text, you need to define a function that translates the text output into a format suitable for LIME.

import openai

def openai_completion(prompt, model="gpt-4", max_tokens=50):
    response = openai.Completion.create(
        engine=model,
        prompt=prompt,
        max_tokens=max_tokens,
        n=1,
        stop=None,
        temperature=0.7
    )
    return response.choices[0].text.strip()

create function for lime explainer:- For text models, it involves altering the input text slightly and observing the resulting completions.

from lime.lime_text import LimeTextExplainer
import numpy as np

class OpenAIWrapper:
    def __init__(self, model="gpt-4"):
        self.model = model
    
    def predict_proba(self, texts):
        # Define the class labels
        classes = ['positive', 'negative']
        
        # Simulate probabilities (for demonstration)
        # In practice, you'd replace this with a real probability scoring
        probas = []
        for text in texts:
            completion = openai_completion(text, model=self.model)
            # Dummy scoring: length-based probability (you'll need a real classifier)
            probas.append([len(completion) % 2, (len(completion) + 1) % 2])
        return np.array(probas)

# Initialize LIME explainer and the OpenAI model wrapper
explainer = LimeTextExplainer(class_names=['positive', 'negative'])
model_wrapper = OpenAIWrapper(model="gpt-4")

# Example text input
text = "The weather today is"

# Generate explanation
exp = explainer.explain_instance(text, model_wrapper.predict_proba, num_features=10)

# Print the explanation
print(exp.as_list())

and finally, interpreting the results, through the explain_instance function Hope, this would give you a brief idea, open for your thoughts on this Thanks

Jul 04 '24 11:07 Siddharth-Latthe-07