transformers icon indicating copy to clipboard operation
transformers copied to clipboard

gpt2 can't be trained for QA ?

Open ucas010 opened this issue 2 years ago • 5 comments

System Info

  • transformers version: 4.26.0.dev0
  • Platform: Linux-3.10.0-1160.81.1.el7.x86_64-x86_64-with-glibc2.17
  • Python version: 3.9.15
  • Huggingface_hub version: 0.11.1
  • PyTorch version (GPU?): 1.13.1+cu116 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

@sgugger

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

code: similar to link but the model is changed to gpt2,

python run_qa.py \
  --model_name_or_path gpt2 \
  --dataset_name squad \
  --do_train \
  --do_eval \
  --per_device_train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /tmp/debug_squad/

or

python run_seq2seq_qa.py \
  --model_name_or_path gpt2 \
  --dataset_name squad_v2 \
  --context_column context \
  --question_column question \
  --answer_column answers \
  --do_train \
  --do_eval \
  --per_device_train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /tmp/debug_seq2seq_squad/

ValueError: Unrecognized configuration class <class 'transformers.models.gpt2.configuration_gpt2.GPT2Config'> for this kind of AutoModel: AutoModelForQuestionAnswering. Model type should be one of AlbertConfig, BartConfig, BertConfig, BigBirdConfig, BigBirdPegasusConfig, BloomConfig, CamembertConfig, CanineConfig, ConvBertConfig, Data2VecTextConfig, DebertaConfig, DebertaV2Config, DistilBertConfig, ElectraConfig, ErnieConfig, FlaubertConfig, FNetConfig, FunnelConfig, GPTJConfig, IBertConfig, LayoutLMv2Config, LayoutLMv3Config, LEDConfig, LiltConfig, LongformerConfig, LukeConfig, LxmertConfig, MarkupLMConfig, MBartConfig, MegatronBertConfig, MobileBertConfig, MPNetConfig, MvpConfig, NezhaConfig, NystromformerConfig, OPTConfig, QDQBertConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, SplinterConfig, SqueezeBertConfig, XLMConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, YosoConfig.

Expected behavior

looking forward to your kind reply thx

ucas010 avatar Feb 16 '23 09:02 ucas010

As said multiple times in the past, please use the forums for questions like this.

sgugger avatar Feb 16 '23 14:02 sgugger

@sgugger this is a bug, not question

ucas010 avatar Feb 17 '23 05:02 ucas010

I believe it's because gpt2 doesn't have a QuestionAnswering head(like GPTJForQuestionAnswering), I would be happy to implement that if @sgugger approves the addition.

susnato avatar Feb 17 '23 07:02 susnato

I don't see a bug. GPT-2 is not meant to be used for question-answering. You can find the list of architectures that support this task by reading the error message or having a look at the question-answering task page in the doc (first tip in green).

@susnato Decoder models perform really poorly on this task, so there is no point adding GPT2ForQuestionAnswering IMO.

sgugger avatar Feb 17 '23 15:02 sgugger

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Mar 18 '23 15:03 github-actions[bot]

I don't see a bug. GPT-2 is not meant to be used for question-answering. You can find the list of architectures that support this task by reading the error message or having a look at the question-answering task page in the doc (first tip in green).

@susnato Decoder models perform really poorly on this task, so there is no point adding GPT2ForQuestionAnswering IMO.

Is it worth including in the library for completeness? I'm trying to use the Cerebras-GPT model suite for some Question Answering tasks and they inherit from the GPT2Model class. Could we still include it?

uwaisiqbal avatar Apr 26 '23 17:04 uwaisiqbal

I don't see a bug. GPT-2 is not meant to be used for question-answering. You can find the list of architectures that support this task by reading the error message or having a look at the question-answering task page in the doc (first tip in green).

@susnato Decoder models perform really poorly on this task, so there is no point adding GPT2ForQuestionAnswering IMO.

question answer task page mentions support for GPT2Model class, is that a bug??

kumaramit003 avatar May 13 '23 20:05 kumaramit003

@kumaramit003 Support for question-answering for the GPT-2 model was added recently in #23030

amyeroberts avatar May 15 '23 13:05 amyeroberts