sanwei111
sanwei111
作者您好。心中有几个疑问,希望您能不吝赐教 1.pre-train上来就是一堆超参(这些超参在哪个文件里面的);pre-train部分的最后一句是训练把,而且后面带了一堆参数?到底我要输入什么指令从而接下去运行。 2.我的服务器只有一个gpu,要运行你的代码,是不是要改一些配置?但是到底要改哪些参数 3.数据集的路径在哪个文件,没看到有 4."we use the English Wikipedia corpus and BookCorpus (Zhu et al., 2015) for pre-training. By concatenating these two datasets, we obtain a corpus with roughly 16GB...
as we know,the conventional attention module can capture features like fig 3.b(including diagonal and other positions). THIS ability is its nature,BUT i JUST wonder that when we add a branch...
hell,in the file of transformer-multibranch-v2,the class of TransformerEncoderLayer--the code are as follow: if args.encoder_branch_type is None:#default=None???? self.self_attn = MultiheadAttention( self.embed_dim, args.encoder_attention_heads, dropout=args.attention_dropout, self_attention=True, ) else: layers = [] embed_dims =...
why do you copy the data in minival2014 to train2014 and the data in minival2014 to testdev2017? for what? what's the point? thx so much
hello,could you share the code that figure out the flops in transformer?
i run the video_demo.py,first,it extract feaetures,after that it suggest me to play,but i can not play,could you please tell me?
**Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior: 1. Go to '...' 2. Click on '....' 3. Scroll...
### 🐛 Describe the bug i have finished the process of train. i just want to ask how to infer?could u please show the code? ### Environment _No response_
### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior {"labels": "xxxxxxxxxxxxxxxxxxxx", "predict": "xxxxxxxxxxxxxxxxxx"} 是什么东西? ### Expected Behavior _No response_ ###...