Yiqun Liu issues

Results 28 issues of


                                            Yiqun Liu

Support getting the pure backward time for benchmark on dynamic

Optimize inference performance of ERNIE on P40 GPU

### 负责人 @Xreki @zhaoyuchen2018 ### 初始性能 - 测试时间：2019年8月14日 - 测试者：@Xreki - GPU平台信息：Tesla P40 - 软件信息： - Driver Version，418.39 - CUDA 9.0 - cuDNN 7.5 - Paddle commit： ``` commit 744279fe685dd0b8b426a686d84ad449da02366e...

The design and optimization of API Benchmark

Optimize the performance of Transformer-Big on 1 V100 GPU

#### 负责人 @wangchaochaohu #### 初始性能 - 测试时间：2019年06月20日 - Paddle commit： - models commit： - 测试脚本：[run.sh](https://github.com/PaddlePaddle/benchmark/blob/master/NeuralMachineTranslation/Transformer/fluid/train/run.sh) ```bash base_batch_size=4096 python -u train.py \ --src_vocab_fpath data/vocab.bpe.32000 \ --trg_vocab_fpath data/vocab.bpe.32000 \ --special_token \ --train_file_pattern...

Optimize the performance of seq2seq model on GPU

#### 初始性能 - 测试时间：2019年8月8日 - 测试者：@Xreki

Make elementwise_add_grad not depend on the input tensors: x and y.

如题

Given some fusion examples.

构造一个示例来说明当前的融合Kernel不支持多输出。构造了2个对比网络。 ### 1. 中间结果不被外部引用 ```python x0 = builder.create_input(Float(32), [32, 32], "x0") x1 = builder.create_input(Float(32), [32, 32], "x1") y0 = builder.elementwise_add(x0, x1, axis=-1) y1 = builder.relu(y0) ``` 生成的CUDA Kernel如下： ```cpp __global__...

Yiqun Liu