optimum icon indicating copy to clipboard operation
optimum copied to clipboard

Raise warning in examples for classification tasks where the model's label mapping can not automatically match with the dataset labels

Open fxmarty opened this issue 3 years ago • 1 comments

Following https://github.com/huggingface/optimum/pull/197 , as per the title. For example

cfg = AutoConfig.from_pretrained("howey/bert-base-uncased-sst2")
print(cfg)
"""prints
BertConfig {
  "_name_or_path": "howey/bert-base-uncased-sst2",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "finetuning_task": "sst2",
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "problem_type": "single_label_classification",
  "transformers_version": "4.22.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}
"""

print(cfg.label2id)
"""prints
{'LABEL_0': 0, 'LABEL_1': 1}
"""

So although in the config.json there is no label2id, it's actually an attribute of the PretrainedConfig. So the previous check (e.g. if optimizer.model.config.label2id) was not enough to avoid the example scripts failing.

I tried all the examples this time.

Before submitting

  • [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).

fxmarty avatar Jul 28 '22 14:07 fxmarty

The documentation is not available anymore as the PR was closed or merged.

I rebased following the changes to ORTQuantizer.

As an example, we prompt for howey/bert-base-uncased-sst2

Model label mapping: {'LABEL_0': 0, 'LABEL_1': 1}
Dataset label features: ClassLabel(num_classes=2, names=['negative', 'positive'], id=None)
Could not guarantee the model label mapping and the dataset labels match. Evaluation results may suffer from a wrong matching.

fxmarty avatar Aug 12 '22 12:08 fxmarty