wintersurvival issues

Results 9 issues of


wintersurvival

Bad generation result when set top_k = 1

Thanks for great work and sharing! When set top_k = 1 and top_p < 0, the generated image of ruDALL-E Malevich (XL) is bad: text: 'радуга на фоне ночного города'...

Deepspeed and training code print different throughput

When training with 8 GPU, the throughput printed by Deepspeed is much smaller than throughput calculated by training code: deepspeed SamplesPerSec=505 sample_per_sec: 50120 It seems that the throughput calculated by...

Using Horovod, GPU 0 uses much more memory than other GPUs

Why generate blank pictures when set top_k in generate.py to 1.0?

When set top_k in generate.py to 1.0, it often generate blank pictures. In my understanding, it will select the maximum probability image token when top_k=1.0. Why does this happen?

8卡的线性度很低

跑fashionbert的多卡，发现8卡的性能跟4卡的性能差不多。请问是用BundleCSVReader读数据部分的代码的问题吗？请问这份代码在多卡的情况下验证过吗？

A activation should be applied to sum of residual and shortcut in the resnetv1 example

wide_deep/config_gpups.yaml的sparse_feature_number为什么是1024？

在wide_deep目录下运行：python -u ../../../tools/trainer.py config_gpups.yaml 报错了： ValueError: (InvalidArgument) Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 1024, but got 737395. Please check input value. [Hint: Expected ids[i] < row_number,...

Cast Ops in onnx file

Sometimes a onnx file has Cast Ops that cast data type to INT64. The case needs to be handled as well, by just modifying the attribute of Cast Ops to...

bias in selfAttention

when running transformer, bias is not existed in selfAttention. mesh_tensorflow/bert has bias in selfAttention. what's the meaning of relative_attention_type transformer_layer.SelfAttention? how could I get the bias in transformer_layer.SelfAttention?