gpt2 can't be trained for QA ?
System Info
transformersversion: 4.26.0.dev0- Platform: Linux-3.10.0-1160.81.1.el7.x86_64-x86_64-with-glibc2.17
- Python version: 3.9.15
- Huggingface_hub version: 0.11.1
- PyTorch version (GPU?): 1.13.1+cu116 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help?
@sgugger
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
code: similar to link but the model is changed to gpt2,
python run_qa.py \
--model_name_or_path gpt2 \
--dataset_name squad \
--do_train \
--do_eval \
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /tmp/debug_squad/
or
python run_seq2seq_qa.py \
--model_name_or_path gpt2 \
--dataset_name squad_v2 \
--context_column context \
--question_column question \
--answer_column answers \
--do_train \
--do_eval \
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /tmp/debug_seq2seq_squad/
ValueError: Unrecognized configuration class <class 'transformers.models.gpt2.configuration_gpt2.GPT2Config'> for this kind of AutoModel: AutoModelForQuestionAnswering. Model type should be one of AlbertConfig, BartConfig, BertConfig, BigBirdConfig, BigBirdPegasusConfig, BloomConfig, CamembertConfig, CanineConfig, ConvBertConfig, Data2VecTextConfig, DebertaConfig, DebertaV2Config, DistilBertConfig, ElectraConfig, ErnieConfig, FlaubertConfig, FNetConfig, FunnelConfig, GPTJConfig, IBertConfig, LayoutLMv2Config, LayoutLMv3Config, LEDConfig, LiltConfig, LongformerConfig, LukeConfig, LxmertConfig, MarkupLMConfig, MBartConfig, MegatronBertConfig, MobileBertConfig, MPNetConfig, MvpConfig, NezhaConfig, NystromformerConfig, OPTConfig, QDQBertConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, SplinterConfig, SqueezeBertConfig, XLMConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, YosoConfig.
Expected behavior
looking forward to your kind reply thx
As said multiple times in the past, please use the forums for questions like this.
@sgugger this is a bug, not question
I believe it's because gpt2 doesn't have a QuestionAnswering head(like GPTJForQuestionAnswering), I would be happy to implement that if @sgugger approves the addition.
I don't see a bug. GPT-2 is not meant to be used for question-answering. You can find the list of architectures that support this task by reading the error message or having a look at the question-answering task page in the doc (first tip in green).
@susnato Decoder models perform really poorly on this task, so there is no point adding GPT2ForQuestionAnswering IMO.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I don't see a bug. GPT-2 is not meant to be used for question-answering. You can find the list of architectures that support this task by reading the error message or having a look at the question-answering task page in the doc (first tip in green).
@susnato Decoder models perform really poorly on this task, so there is no point adding GPT2ForQuestionAnswering IMO.
Is it worth including in the library for completeness? I'm trying to use the Cerebras-GPT model suite for some Question Answering tasks and they inherit from the GPT2Model class. Could we still include it?
I don't see a bug. GPT-2 is not meant to be used for question-answering. You can find the list of architectures that support this task by reading the error message or having a look at the question-answering task page in the doc (first tip in green).
@susnato Decoder models perform really poorly on this task, so there is no point adding GPT2ForQuestionAnswering IMO.
question answer task page mentions support for GPT2Model class, is that a bug??
@kumaramit003 Support for question-answering for the GPT-2 model was added recently in #23030