Paddle
Paddle copied to clipboard
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
### 需求描述 Feature Description tensor.numpy()执行大量数据从GPU拷贝到CPU速度缓慢,5M数据执行tensor.numpy()耗费了1.4s,完全不可接受!啥原因呢? ### 替代实现 Alternatives _No response_
### PR types Performance optimization ### PR changes OPs ### Describe Optimize topk's performance when k is small and input_width is large dtype:FP32,循环测试10w次,取平均值。 优化后在k值较小且input_width较大时速度提升3-4倍。
### PR types Breaking changes ### PR changes Others ### Describe Fix https://github.com/PaddlePaddle/Paddle/issues/46314
### PR types Others ### PR changes Others ### Describe 这是个测试PR,验证在优化精准测map中的一些想法
### PR types New features ### PR changes OPs ### Describe Add `bernoulli_p` autograd primitive op and support orig2prim for paddle original `dropout` op.
### PR types New features ### PR changes OPs ### Describe deformable_conv_v1 算子实现 float16 数据类型支持。 通过benchmark中测试用例,float32与float16前向速度~~接近~~更快: | Case No. | x_shape|offset_shape|weight_shape|mask_shape | data_type | Paddle Perf(ms) | |---|---|---|---|---|---|---| | 1...
### PR types Performance optimization ### PR changes OPs ### Describe 为cumsum和logcumsumexp 新增float16 数据类型 测试设备:RTX 2070s 目前cumsum测试结果(仅仅前向): | Case No. | input_shape | fp32(ms) | fp16(ms) | max_absolute_diff | max_relative_diff...
### PR types Bug fixes ### PR changes Others ### Describe 上次core_avx.so 的名字变更成libpaddle.so了,现在遇到一个问题。soname是liblibpaddle.so,导致运行时找lib库的时候找不到。需要将 soname 修改成libpaddle.so就可以了。
### PR types Others ### PR changes Others ### Describe layer->getOutput()->setType() may fail, use layer->setOutputType() instead. in trt 8.4 setType fails, but ok in trt 8.2, so strange,so I use...
### 请提出你的问题 Please ask your question While importing Paddle library from paddle OCR getting an error since we do not have permission to create path in Home directory on analysis...