Paddle
Paddle copied to clipboard
[mkldnn] Fix elementwise_sub sign reverse for mkldnn
PR types
Bug fixes
PR changes
OPs
Describe
-
Fixed elementwise_sub mkldnn kernel BUG while specifying x.shape as [180, 1] and y.shape as [1, 256] . In this case, the sign of some numbers is wrong.
-
But we found that it raised
dnnl::error, could not create a primitive descriptor for a reorder primitive
error after we added unittesttest_check_grad_ignore_x
andtest_check_grad_normal
in TestMKLDNNElementwiseSubOp40. The log information withGLOG_v=6
as follows:I0914 12:02:47.651978 11870 operator.cc:277] Place(cpu) Op(elementwise_sub_grad), inputs:{Out@GRAD[Out@GRAD:float[180, 256]({})(Place(cpu))], X[X:unknown_dtype[180, 1]({})(Place(cpu))], Y[Y:unknown_dtype[1, 256]({})(Place(cpu))]}, outputs:{X@GRAD[], Y@GRAD[Y@GRAD:float[1, 256]({})(Place(cpu))]}. I0914 12:02:47.652005 11870 operator.cc:210] Place(cpu) Op(fetch), inputs:{X[Y@GRAD:float[1, 256]({})(Place(cpu))]}, outputs:{Out[fetch:[-1]({{}})()]}. I0914 12:02:47.652037 11870 fetch_op.cc:109] Fetch variable Y@GRAD to variable fetch's 0 column. I0914 12:02:47.652055 11870 fetch_op.cc:36] innerTransDataLayoutFromMKLDNN I0914 12:02:47.652062 11870 device_context.cc:80] DeviceContextPool Get: Place(cpu) I0914 12:02:47.652086 11870 allocator_facade.cc:343] GetAllocator Place(cpu) 1024 W0914 12:02:47.652269 11870 operator.cc:284] fetch raises an exception dnnl::error, could not create a primitive descriptor for a reorder primitive I0914 12:02:47.652379 11870 executor.cc:69] destroy ExecutorPrepareContext
-
Moreover, if we reversed the shape of x and y, such as [1, 180] [256, 1] , it still failed and raised error,such as
element_sub_grad reshape error
How to reproduce ?
- step 1: git fetch this PR
- step 2:
cd Paddle & mkdir build & cd build
- step 3: build paddle
#!/bin/bash
cmake .. -DWITH_GPU=ON \
-DPY_VERSION=3.8 \
-DWITH_DISTRIBUTE=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DWITH_MKL=ON \
-DWITH_MKLDNN=ON
#make -j$(nproc)
make -j24
python -m pip uninstall paddlepaddle-gpu -y
python -m pip install python/dist/paddlepaddle_gpu-0.0.0-cp38-cp38-linux_x86_64.whl
- step 4: run this unittest
cd Paddle/python/paddle/fluid/tests/unittests
python test_elementwise_sub_op.py TestMKLDNNElementwiseSubOp40
# or run single test case
GLOG_v=6 python test_elementwise_sub_op.py TestMKLDNNElementwiseSubOp40.test_check_grad_ignore_x > err.log 2>&1
@piotrekobi Please review
@zh794390558 We are looking at this PR and testing it against our workloads. We should have results by the end of this week. We will write here if all is fine or any problems found.
Fix as comment, please check it.
updating on behalf of @piotrekobi - we found big accuracy drop for a model caused by this PR in our test... currently we are re-testing to ensure it is not a false alert. Stay tuned.
updating on behalf of @piotrekobi - we found big accuracy drop for a model caused by this PR in our test... currently we are re-testing to ensure it is not a false alert. Stay tuned.
can i merge this PR? I need this feature with other to do other work, thx.
Yes. The drop turned out to be caused by some different change. I'm sorry for the trouble.