P-tuning
P-tuning copied to clipboard
A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.
Hi
 直接把项目git到本地跑,只把预训练模型换成了bert-base-cased,就报这个错,请问是什么原因呢?  这两个维度不一致怎么复制呢?报错就是这一步操作
您好,我在复现Few-shot SuperGLUE(即`FewGLUE_32dev`数据)实验时,CB、WSC、COPA数据集的结果和论文中存在一定差距(复现实验所有模型均基于`albert-xxlarge-v2`这一个预训练模型,与论文设计一致,实验seed=42无修改):  ## 实验设置差异: ### 关于CB数据集的实验 - 原始脚本中使用8卡/pet_per_gpu_train_batch_size=2/pet_gradient_accumulation_steps=1,在我的实验中使用1卡/pet_per_gpu_train_batch_size=8/pet_gradient_accumulation_steps=2,其余参数无差异; - 最终结果acc最高85.71,f1-macro对应78.76,论文结果为92.9/92.3; - 在项目的issue中我找到您关于CB数据集效果不如论文的解释:[https://github.com/THUDM/P-tuning/issues/12](url),如果是脚本参数有误造成的,请问什么时候会更新训练脚本呢? ### 关于WSC数据集的实验 - 与原脚本参数无差异(1卡/pet_per_gpu_train_batch_size=16/pet_gradient_accumulation_steps=1); - 最终结果acc最高81.73,论文结果为84.6; ### 关于COPA数据集的实验 - 与原脚本参数无差异(1卡/pet_per_gpu_train_batch_size=16/pet_gradient_accumulation_steps=1); - 最终结果acc最高79.00,论文结果为87.0; ## python库版本差异 考虑到可能存在版本差异影响造成复现效果不同,在此列出与requirements.txt对应的python库版本(括号中为项目requirements的库版本): - numpy...
您好,请问fewGLUE中各任务采用的模板是什么格式的呢?
Hi P-tuning authors, I would like to ask if I want to evaluate P-tuning on some new data that are not used in your paper, which part in your code...
1.你好,我想问一下,在P-tunning中,[Mask]在一众[unused]中得位置是怎么确定的?是人工选择的吗?如果不是的话,是根据什么方式确定的? 2.原论文中写的当数据量比较少的时候用的anchor-word,比如预测“英国首都”,在几个[unused]中加一个[capital]效果会比较好,这个[capital]应该加在哪个位置是如何确定的呢?
I called `python cli.py --model_name=gpt2` After several epochs, it printed the following error: ``` P1001 Dev Epoch 41 Loss: 0.007320404052734375 Hit@1: 0.6756756756756757 Traceback (most recent call last): File "cli.py", line...
hi, I read the paper . on Figure 1, it say that GPTs can be better than similar-sized BERTs on NLU with P-tuning. but It seems that there is no...
Hi, thanks for your great work! I am looking for training GPT-2 on SuperGLUE but I could only find such implementation based on BERT/Roberta/albert. Could you point out the location...
Hi, i have just used the default params to p-tune the gpt2-medium on LAMA task and the results is as follows. best dev_hit@1: 51.8 best test_hit@1: 44.5 For the results...