ComputeLibrary
ComputeLibrary copied to clipboard
NEMeanStdDevNormalizationLayer returns nans for f16 tensors
NEMeanStdDevNormalizationLayer returns nans if srd\dst tensors are f16. The issue was reproduced on ACL 23.08
How ACL was built: scons neon=1 opencl=0 openmp=0 cppthreads=1 arch=armv8.6-a Werror=false validation_tests=1 --jobs=8 os=macos build=native --silent fixed_format_kernels=1 asserts=1 debug=1
How reproducer was built: clang++ -O2 -g -I./ComputeLibrary -I./ComputeLibrary/include mvn_bug.c -o bug -L./ComputeLibrary/build/ -L./ComputeLibrary/build/tests/ -L./ComputeLibrary/build/tests/framework/ -larm_compute -lAssetsLibrary.o -lRawTensor.o -lExceptions.o -std=c++17
Issue was reproduced on Apple M1
Reproducer:
#include "arm_compute/core/TensorShape.h"
#include "arm_compute/runtime/Tensor.h"
#include "arm_compute/runtime/NEON/functions/NEMeanStdDevNormalizationLayer.h"
#include "tests/Utils.h"
#include "tests/AssetsLibrary.h"
#include "tests/NEON/Accessor.h"
#include <iostream>
#include <vector>
using namespace arm_compute;
using namespace arm_compute::test;
int main(int argc, char *argv[]) {
size_t X = 128;
size_t Y = 64;
float epsValue_ = 0.00000999999974f;
TensorInfo srcTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
TensorInfo dstTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
auto status = NEMeanStdDevNormalizationLayer::validate(&srcTensorInfo, &dstTensorInfo, epsValue_);
if(status.error_code() != ErrorCode::OK) {
std::cout << "ERROR: " << status.error_description().c_str() << std::endl;
exit(1);
}
std::cout << "PASSED VALIDATION" << std::endl;
Tensor srcTensor;
Tensor dstTensor;
srcTensor.allocator()->init(srcTensorInfo);
dstTensor.allocator()->init(dstTensorInfo);
NEMeanStdDevNormalizationLayer mvn;
mvn.configure(&srcTensor, &dstTensor, epsValue_);
std::cout << "PASSED CONFIGURATION" << std::endl;
srcTensor.allocator()->allocate();
dstTensor.allocator()->allocate();
AssetsLibrary library(".", std::random_device()());
std::uniform_real_distribution<> distribution{ -2000.0f, 3000.0f };
library.fill(Accessor(srcTensor), distribution, 0);
srcTensor.print(std::cout);
mvn.run();
std::cout << "PASSED RUN" << std::endl;
dstTensor.print(std::cout);
srcTensor.allocator()->free();
dstTensor.allocator()->free();
return 0;
}
Hi @alvoron
I managed to reproduce this, however the range of input values in your test [-2000.f,3000.f]
is not supported for float16_t
in the operator NEMeanStdDevNormalizationLayer
.
We just test for values in the range [-1.f , 1.f] see https://github.com/ARM-software/ComputeLibrary/blob/main/tests/validation/fixtures/MeanStdDevNormalizationLayerFixture.h#L61
I've also modified the test to use [-1000.f, 1000.f] and I see no nans
18 int main(int argc, char *argv[]) {
19 size_t X = 128;
20 size_t Y = 64;
21 float epsValue_ = 0.00000999999974f;
22
23 TensorInfo srcTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
24 TensorInfo dstTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
25
26 auto status = NEMeanStdDevNormalizationLayer::validate(&srcTensorInfo, &dstTensorInfo, epsValue_);
27 if(status.error_code() != ErrorCode::OK) {
28 std::cout << "ERROR: " << status.error_description().c_str() << std::endl;
29 exit(1);
30 }
31
32 std::cout << "PASSED VALIDATION" << std::endl;
33
34 Tensor srcTensor;
35 Tensor dstTensor;
36 srcTensor.allocator()->init(srcTensorInfo);
37 dstTensor.allocator()->init(dstTensorInfo);
38
39 NEMeanStdDevNormalizationLayer mvn;
40 mvn.configure(&srcTensor, &dstTensor, epsValue_);
41 std::cout << "PASSED CONFIGURATION" << std::endl;
42
43 srcTensor.allocator()->allocate();
44 dstTensor.allocator()->allocate();
45
46 std::uniform_real_distribution<float> distribution(-1000.0f, 1000.0f);
47 Window window;
48 window.use_tensor_dimensions(srcTensor.info()->tensor_shape());
49 execute_window_loop(window,
50 [&](const Coordinates &id)
51 {
52 const auto value = static_cast<float16_t>(distribution(gen));
53 *reinterpret_cast<float16_t *>(srcTensor.ptr_to_element(id)) = float16_t(value);
54 });
55
56 srcTensor.print(std::cout);
57 mvn.run();
58 std::cout << "PASSED RUN" << std::endl;
59 dstTensor.print(std::cout);
60
61 srcTensor.allocator()->free();
62 dstTensor.allocator()->free();
63
64 return 0;
What's the use case for the range of values [-2000.0f, 3000.0f] ? is there a model using this?
Hope this helps
The issue is reproduced on style transfer model. I've got [-2000, 3000] range there.
I was able to reproduce the issue with the range [0, 1000]. Could you try?
Hi @alvoron
Thank you for sharing the details. The following patch fixes the problem: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11311
This fix will be included in 24.04
Hope this helps.