Shaw

Results 12 comments of Shaw

Thanks for your great work in reproducing P-tuning for few-shot SuperGLUE. In practice, we find few-shot learning's reproducibility extremely relates with environmental setting, hyper-parameters (e.g., batch-sizes, gradient-accumulation-step) and number of...

Hi @SCU-JJkinging , 是维度不一致的问题。原始实验是用的albert-xxlarge-v2,hidden size应该是1024;bert-base-cased是768。由于P-tuning的做法是把prompt位置的input embedding换成external trainable embedding,所以需要保持维度一致。

@SCU-JJkinging , 是的。我们的代码应当是默认使用的1024。

@SCU-JJkinging , 1. 这是因为albert-xxlarge-v2使用了Factorized embedding parameterization的技术,将一个1024的embedding分解为128的向量乘上一个128 * 1024的矩阵进行重参数化压缩参数量 2. 这是显然的。决定任务性能的主要是预训练模型的质量,albert本来就是bert的一个升级版本;另外,不同预训练模型需要不同的超参数进行微调,直接使用我们在albert-xxlarge-v2上的最优超参,一般情况下无法在其他模型上得到最优结果

@NThakur20 Hi, I am also looking for these evaluation code. It seems there are some complicated logics here. Could you please provide some hints on when will you release these...

Task-specific linear head is fine-tuned with prompt embeddings. The comparison of using LM head and task-specific linear head is provided in our experiment (Table 5), which shows that in a...

Yes, LM head can not be applied to sequence tagging as for now. Your observation on PT with SQuAD is quite interesting. Have you frozen the pre-trained models' parameters? If...

@Randl Hi, Thanks for your reminding! We have contacted the BIG-bench organizers and are evaluating GLM-130B over BIG-Bench (probably on BIG-bench-lite first) under their guidance. The results will be reported...

@Luohuarucanxue 你好, NER和SRL任务我们并非使用的Huggingface Dataset自动下载的数据集。请按照我们的README中的说明,可以下载获取CoNLL03和CoNLL04训练用的文件。 参考[PaperWithCodes](https://paperswithcode.com/sota/named-entity-recognition-ner-on-conll-2003)中的结果,似乎目前CoNLL03最高的F1结果也只有94.6。我猜测是Huggingface dataset提供的数据或者脚本有问题。

@portia1026 Hi, Sorry for the late reply. We are still working on releasing code with regards to parameter-efficient tuning and fine-tuning on GLM-130B. However, currently we suggest to not consider...