transformers gpt2 can't be trained for QA ?

System Info

transformers version: 4.26.0.dev0
Platform: Linux-3.10.0-1160.81.1.el7.x86_64-x86_64-with-glibc2.17
Python version: 3.9.15
Huggingface_hub version: 0.11.1
PyTorch version (GPU?): 1.13.1+cu116 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@sgugger

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

code: similar to link but the model is changed to gpt2,

python run_qa.py \
  --model_name_or_path gpt2 \
  --dataset_name squad \
  --do_train \
  --do_eval \
  --per_device_train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /tmp/debug_squad/

or

python run_seq2seq_qa.py \
  --model_name_or_path gpt2 \
  --dataset_name squad_v2 \
  --context_column context \
  --question_column question \
  --answer_column answers \
  --do_train \
  --do_eval \
  --per_device_train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /tmp/debug_seq2seq_squad/

ValueError: Unrecognized configuration class <class 'transformers.models.gpt2.configuration_gpt2.GPT2Config'> for this kind of AutoModel: AutoModelForQuestionAnswering. Model type should be one of AlbertConfig, BartConfig, BertConfig, BigBirdConfig, BigBirdPegasusConfig, BloomConfig, CamembertConfig, CanineConfig, ConvBertConfig, Data2VecTextConfig, DebertaConfig, DebertaV2Config, DistilBertConfig, ElectraConfig, ErnieConfig, FlaubertConfig, FNetConfig, FunnelConfig, GPTJConfig, IBertConfig, LayoutLMv2Config, LayoutLMv3Config, LEDConfig, LiltConfig, LongformerConfig, LukeConfig, LxmertConfig, MarkupLMConfig, MBartConfig, MegatronBertConfig, MobileBertConfig, MPNetConfig, MvpConfig, NezhaConfig, NystromformerConfig, OPTConfig, QDQBertConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, SplinterConfig, SqueezeBertConfig, XLMConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, YosoConfig.

Expected behavior

looking forward to your kind reply thx

Feb 16 '23 09:02 ucas010

As said multiple times in the past, please use the forums for questions like this.

Feb 16 '23 14:02 sgugger

@sgugger this is a bug, not question

Feb 17 '23 05:02 ucas010

I believe it's because gpt2 doesn't have a QuestionAnswering head(like GPTJForQuestionAnswering), I would be happy to implement that if @sgugger approves the addition.

Feb 17 '23 07:02 susnato

I don't see a bug. GPT-2 is not meant to be used for question-answering. You can find the list of architectures that support this task by reading the error message or having a look at the question-answering task page in the doc (first tip in green).

@susnato Decoder models perform really poorly on this task, so there is no point adding GPT2ForQuestionAnswering IMO.

Feb 17 '23 15:02 sgugger

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Mar 18 '23 15:03 github-actions[bot]

I don't see a bug. GPT-2 is not meant to be used for question-answering. You can find the list of architectures that support this task by reading the error message or having a look at the question-answering task page in the doc (first tip in green).

@susnato Decoder models perform really poorly on this task, so there is no point adding GPT2ForQuestionAnswering IMO.

Is it worth including in the library for completeness? I'm trying to use the Cerebras-GPT model suite for some Question Answering tasks and they inherit from the GPT2Model class. Could we still include it?

Apr 26 '23 17:04 uwaisiqbal

I don't see a bug. GPT-2 is not meant to be used for question-answering. You can find the list of architectures that support this task by reading the error message or having a look at the question-answering task page in the doc (first tip in green).

@susnato Decoder models perform really poorly on this task, so there is no point adding GPT2ForQuestionAnswering IMO.

question answer task page mentions support for GPT2Model class, is that a bug??

May 13 '23 20:05 kumaramit003

@kumaramit003 Support for question-answering for the GPT-2 model was added recently in #23030

May 15 '23 13:05 amyeroberts