Zhang Peiyuan issues

Results 23 issues of


                                            Zhang Peiyuan

training time?

Hi Kelvin, Thanks for your work and open source effort. May I ask how long does it take for you to train BasicVSR on the REDS dataset? On my end,...

Could you move FewRel competition to the new CodaLab Website?

学长您好，请问您能否将fewrel1.0 和 2.0 搬到新的CodaLab 网站 https://codalab.lisn.upsaclay.fr/. 之前的老网站 https://competitions.codalab.org/ 时不时就会出问题。现在 https://competitions.codalab.org/competitions/27981 又没法连接了。或者请问能否将test data 发我一份到[email protected]. 因为我在赶EMNLP的rebuttal, 现在非常需要test result 😭. 谢谢！

Questions regarding the impact of preprocessing model

Thanks for your excellent work! May I ask when you create the dataset, have you exmamined the impact brought by different human detection and objection models used in the pre-processing...

Why Multimodal Chain-of-Thought is stil significantly better than UnifiedQA when there is no visual input?

Dear authors, Thanks for your exciting and solid work. May I ask why Multimodal Chain-of-Thought is still significantly better than UnifiedQA when there is no visual input (e.g, the text...

Implementation of adapter?

Parameter-efficient transfer learning for NLP May I ask why you opt not to implement the "adapter" from this paper? Is it due to performance or anything else?

wip

Proper comparison between adapter-tuning, lora-tuning, prompt-tuning, and prefix-tuning?

Parameter-efficient methods have been studied extensively in (small or large) language models since BERT. Given those abundant prior works, why is there no controlled experiment comparing those different methods in...

How does BinaryBERT store the 1 bit weight?

May I ask how BinaryBERT stores the 1-bit weights on GPU memory? Are they stored as (native) 1 bit on the GPU memory or as 8-bit integer?

Does current codebase support fp16/bf16?

I attempted to execute the code while enabling bf16 but came across some error messages. Furthermore, I conducted a search for the keywords "bf16" or "fp16" within this repository but...

How do you make Transformer generate tokens in parallel?

It is mentioned multiple times in the paper that all tokens from the same scale r are generated in parallel. Did I overlook or there is actually little description about...

good first issue

LlamaMOE. The order of softmax and topK

Dear authors, I notice that in HF's Mixtral: https://github.com/huggingface/transformers/blob/b109257f4fb8b1166e7c53cc5418632014ed53a5/src/transformers/models/mixtral/modeling_mixtral.py#L852 Softmax is called before topK, and they re-normalize the probs after topK. While in LitGPT's LlamaMOE: https://github.com/Lightning-AI/litgpt/blob/c81800f455dd997f786cbe2e110eff1f5c0d2d3b/litgpt/model.py#L341 Softmax is called after...

question