jinming issues

Results 10 issues of


jinming

Failed to download the saved_embd.pt

Hi， When I download the data.zip and will prompt the error. Now I have the multinli_0.9 and snli datasets, but don't have the saved_embed.pt. So, I want to know the...

Difference between the conditional mechanism and the "Masked Language Modeling with Visual Clues" in VL-BERT

Question About the HIghway network？

In the framework， before the BiLSTM layer（context representive and aggragation layer）， there are a highway network. So, I want to know the reason that add this highwat layer?

关于Full-match-layer 的输出维度

遇到一个问题, 在full match layer的输出中 , 论文中: sentence1_match_output shape = batchsize * len(sentence2) * L (num perspective) sentence2_match_output shape = batchsize * len(sentence1) * L (num perspective) 然后作为 aggregation 中的 lstm...

关于采样结果不能直接计算 compute_levenshtein()

Hi， https://github.com/ne7ermore/deeping-flow/blob/master/reinforced-translate/model.py b_words = model.sample(self.prev) s_words, s_props = model.sample(self.prev, False) rewards = self.compute_levenshtein(model.tgt, s_words) baseline = self.compute_levenshtein(model.tgt, b_words) advantage = rewards - baseline 其中的 b-words 和 s-words 都是 Tensor 类型，...

计算loss的一点疑惑

Hi， https://github.com/ne7ermore/deeping-flow/blob/master/reinforced-translate/model.py#L175 mask = pad_mask(model.tgt, EOS, [args.batch_size, args.max_len]) 这里是不是应该是 MC 采样的作为target？应该是s_words? Thanks

Inference on my own data

Hi, Thank you for sharing your code and models! I need to use you code and models on my own video data for other tasks. My own videos are the...

基于10B模型继续预训练，遇到world size 不一致导致报错

/data/pretrained_model/glm-10b-chinese-mp4/80000/mp_rank_03_model_states.pt. Traceback (most recent call last): File "pretrain_glm.py", line 673, in main() File "pretrain_glm.py", line 584, in main args.iteration = load_checkpoint(model, optimizer, lr_scheduler, args, no_deepspeed=args.no_deepspeed_load) File "/data/users/zhaojinming/source/glm10BCodesSlurmN1/utils.py", line 339, in...

cifia 的center loss 的训练过程不收敛

您好，下面是几个阶段的训练输出，center-loss 下降的特别快，而softmax loss 基本不动，随着训练进行，centerloss 逐渐增加，softmax loss 逐渐下降，在其他的数据集上训练过程也是如此，这样正常吗，能否解释一下这个过程的原因？ (正常情况下 l2 loss 也是这样，一般先降 l2 loss 然后再降 softmax loss，这样训练就特别慢, 您能帮忙解释一下不？) 。谢谢！ step: 0, training accuracy: 0.01, training loss: 9.13, center_loss_value:...

At test stage, how to build the subgraph?

Hi, Thank for your released data and code. In this paper, you use the input message and scene info to decide the source-entity and use the response to decide the...