transformers-interpret
transformers-interpret copied to clipboard
ZeroShotClassificationExplainer does not correctly explain ZeroShotClassificationPipeline results (single label)
In the case of a single label, the logic to calculate the classification probability with the ZeroShotClassificationExplainer
(see here) is different than the logic in the Huggingface ZeroShotClassificationPipeline
(see here):
- The Huggingface
ZeroShotClassificationPipeline
calculates the softmax over entailment and contradiction scores and returns the resulting value for entailment, but - the
ZeroShotClassificationExplainer
returns just the sigmoid of the entailment score.
At least, if this is intended, it should be documented somewhere. My usecase is multi-label classification and I used the single label approach to simulate that, but it took me some time to figure out that this does not work to explain ZeroShotClassificationPipeline
predictions.
Hi @ArneBinder ,
With the line example you linked for Transformers Interpret it is an edge case to accommodate models where there is only a single output node, it's a carryover from the sequence classifier and in all likelihood would never be used for ZeroShot due to the reliance on NLI models.
It's also worth pointing out that the zero-shot explainer is a subclass of both the SequenceClassificationExplainer
and the QuestionAnsweringExplainer
, rather esoterically I use the method _get_preds() from the QA explainer and then the zero-shot explainers' own _forward() method which is a softmax w.r.t. the entailment class only.
The reason I don't include the contradiction score here is that given the way we use the zero-shot explainer the contradiction scores are not relevant. What we want to know is which class label made the NLI model fire the most w.r.t entailment. At least that's how I interpret it, I might be missing something though. How would you think that the contradiction scores could be used for a zero-shot pipeline ?
Also, if you are trying to explain a multi-label system I would suggest using our new MultiLabelClassificationExplainer
.
Thanks for the quick response (and also this very cool project btw)!
How would you think that the contradiction scores could be used for a zero-shot pipeline ?
As I sad, the contradiction scores are used to normalize the entailment scores in the transformers.ZeroShotClassificationPipeline
when classes are independent (multi-label), see this code.
What exactly can I do in this case? For now, I have the following code to get my predictions:
sequences = pd.Series(["text1", "text2"])
classes = pd.Series(["label1", "label2", "label3"])
hypothesis_template: str = 'This example is {}.'
model: ZeroShotClassificationPipeline = transformers.pipeline("zero-shot-classification")
# call the pipeline with multi_label=True
model_output = model(
sequences=sequences.to_list(), candidate_labels=classes.to_list(), hypothesis_template=hypothesis_template, multi_label=True
)
# convert to dataframe. note: ld_to_dl converts a list of dicts to a dict of lists
res_df = pd.DataFrame(ld_to_dl(model_output))
# use "labels" returned by model to rearrange result because order of scores is not fixed
classes_to_index = pd.Series(data=classes.index, index=classes.values)
predictions = res_df.apply(lambda row: pd.Series(row["scores"], index=classes_to_index[row["labels"]], dtype=float), axis=1)
predictions.index = sequences.index
# result is a dataframe with classes.index as columns, sequences.index as index and scores as entries
Is there an easy way to apply transfomers-interpret
?
EDIT: I just re-implemented the Huggingface transformers.ZeroShotClassificationPipeline
logic for my purpose to use a transformers.ZeroShotClassificationPipeline
instead (this pipeline just calls the base model and applies softmax over the classes, see here), maybe this is a good starting point:
model_name = "facebook/bart-large-mnli"
pipeline: TextClassificationPipeline = transformers.pipeline("text-classification", model=model_name, tokenizer=model_name, return_all_scores=True)
input_texts = assemble_nli_input_texts(
sequences=sequences.to_list(), labels=classes.to_list(), hypothesis_template=hypothesis_template,
tokenizer=pipeline.tokenizer
)
# code below is equivalent to:
# pipeline_output = pipeline(input_texts, add_special_tokens=False)
# scores_dict = ld_to_dl([{x["label"]: x["score"] for x in seq_res} for seq_res in pipeline_output])
# named_scores = {k: np.array(v) for k, v in scores_dict.items()}
model_inputs = pipeline._parse_and_tokenize(input_texts, add_special_tokens=False)
with torch.no_grad():
model_output = pipeline.model(**model_inputs)[0].cpu()
model_output_softmax = np.exp(model_output) / np.exp(model_output).sum(-1, keepdims=True)
named_scores = {label: model_output_softmax[:, idx].numpy() for label, idx in pipeline.model.config.label2id.items()}
scores_entailment_normalized = named_scores["entailment"] / (
named_scores["entailment"] + named_scores["contradiction"])
scores_reshaped = scores_entailment_normalized.reshape((len(sequences), len(classes)))
predictions = pd.DataFrame(scores_reshaped, index=sequences.index, columns=classes.index)
with
def assemble_nli_input_texts(
sequences: List[str], labels: List[str], hypothesis_template: str, tokenizer: PreTrainedTokenizer,
):
args_parser = ZeroShotClassificationArgumentHandler()
sequence_pairs = args_parser(sequences=sequences, labels=labels, hypothesis_template=hypothesis_template)
encodings = tokenizer(
sequence_pairs,
add_special_tokens=True,
return_tensors=None,
padding=False,
truncation=False,
)
How can I make use of MultiLabelClassificationExplainer
to get explanations for the normalized output?