ComputeLibrary NEMeanStdDevNormalizationLayer returns nans for f16 tensors

NEMeanStdDevNormalizationLayer returns nans if srd\dst tensors are f16. The issue was reproduced on ACL 23.08

How ACL was built: scons neon=1 opencl=0 openmp=0 cppthreads=1 arch=armv8.6-a Werror=false validation_tests=1 --jobs=8 os=macos build=native --silent fixed_format_kernels=1 asserts=1 debug=1

How reproducer was built: clang++ -O2 -g -I./ComputeLibrary -I./ComputeLibrary/include mvn_bug.c -o bug -L./ComputeLibrary/build/ -L./ComputeLibrary/build/tests/ -L./ComputeLibrary/build/tests/framework/ -larm_compute -lAssetsLibrary.o -lRawTensor.o -lExceptions.o -std=c++17

Issue was reproduced on Apple M1

Reproducer:

#include "arm_compute/core/TensorShape.h"

#include "arm_compute/runtime/Tensor.h"
#include "arm_compute/runtime/NEON/functions/NEMeanStdDevNormalizationLayer.h"

#include "tests/Utils.h"
#include "tests/AssetsLibrary.h"
#include "tests/NEON/Accessor.h"

#include <iostream>
#include <vector>

using namespace arm_compute;
using namespace arm_compute::test;

int main(int argc, char *argv[]) {
   size_t X = 128;
   size_t Y = 64;
   float epsValue_ = 0.00000999999974f;

  TensorInfo srcTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
  TensorInfo dstTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);

  auto status = NEMeanStdDevNormalizationLayer::validate(&srcTensorInfo, &dstTensorInfo, epsValue_);
  if(status.error_code() != ErrorCode::OK) {
    std::cout << "ERROR: " << status.error_description().c_str() << std::endl;
    exit(1);
  }

  std::cout << "PASSED VALIDATION" << std::endl;

  Tensor srcTensor;
  Tensor dstTensor;
  srcTensor.allocator()->init(srcTensorInfo);
  dstTensor.allocator()->init(dstTensorInfo);

  NEMeanStdDevNormalizationLayer mvn;
  mvn.configure(&srcTensor, &dstTensor, epsValue_);
  std::cout << "PASSED CONFIGURATION" << std::endl;

  srcTensor.allocator()->allocate();
  dstTensor.allocator()->allocate();

  AssetsLibrary library(".", std::random_device()());
  std::uniform_real_distribution<> distribution{ -2000.0f, 3000.0f };
  library.fill(Accessor(srcTensor), distribution, 0);

  srcTensor.print(std::cout);
  mvn.run();
  std::cout << "PASSED RUN" << std::endl;
  dstTensor.print(std::cout);

  srcTensor.allocator()->free();
  dstTensor.allocator()->free();

  return 0;
}

Feb 20 '24 18:02 alvoron

Hi @alvoron

I managed to reproduce this, however the range of input values in your test [-2000.f,3000.f] is not supported for float16_t in the operator NEMeanStdDevNormalizationLayer.

We just test for values in the range [-1.f , 1.f] see https://github.com/ARM-software/ComputeLibrary/blob/main/tests/validation/fixtures/MeanStdDevNormalizationLayerFixture.h#L61

I've also modified the test to use [-1000.f, 1000.f] and I see no nans

 18  int main(int argc, char *argv[]) {
 19    size_t X = 128;
 20    size_t Y = 64;
 21    float epsValue_ = 0.00000999999974f;
 22 
 23   TensorInfo srcTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
 24   TensorInfo dstTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
 25 
 26   auto status = NEMeanStdDevNormalizationLayer::validate(&srcTensorInfo, &dstTensorInfo, epsValue_);
 27   if(status.error_code() != ErrorCode::OK) {
 28     std::cout << "ERROR: " << status.error_description().c_str() << std::endl;
 29     exit(1);
 30   }
 31 
 32   std::cout << "PASSED VALIDATION" << std::endl;
 33     
 34   Tensor srcTensor;
 35   Tensor dstTensor;
 36   srcTensor.allocator()->init(srcTensorInfo);
 37   dstTensor.allocator()->init(dstTensorInfo);
 38 
 39   NEMeanStdDevNormalizationLayer mvn;
 40   mvn.configure(&srcTensor, &dstTensor, epsValue_);
 41   std::cout << "PASSED CONFIGURATION" << std::endl;
 42 
 43   srcTensor.allocator()->allocate();
 44   dstTensor.allocator()->allocate();
 45 
 46    std::uniform_real_distribution<float> distribution(-1000.0f, 1000.0f);
 47    Window window;
 48    window.use_tensor_dimensions(srcTensor.info()->tensor_shape());
 49    execute_window_loop(window,
 50                            [&](const Coordinates &id)
 51                              {
 52                                 const auto value                                  = static_cast<float16_t>(distribution(gen));
 53                                  *reinterpret_cast<float16_t *>(srcTensor.ptr_to_element(id)) = float16_t(value);
 54           });                    
 55           
 56   srcTensor.print(std::cout);
 57   mvn.run();
 58   std::cout << "PASSED RUN" << std::endl;
 59   dstTensor.print(std::cout);
 60 
 61   srcTensor.allocator()->free();
 62   dstTensor.allocator()->free();
 63 
 64   return 0;

What's the use case for the range of values [-2000.0f, 3000.0f] ? is there a model using this?

Hope this helps

Mar 15 '24 10:03 morgolock

The issue is reproduced on style transfer model. I've got [-2000, 3000] range there.

I was able to reproduce the issue with the range [0, 1000]. Could you try?

Mar 15 '24 16:03 alvoron

Hi @alvoron

Thank you for sharing the details. The following patch fixes the problem: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11311

This fix will be included in 24.04

Hope this helps.

Mar 19 '24 17:03 morgolock

ComputeLibrary ComputeLibrary copied to clipboard

NEMeanStdDevNormalizationLayer returns nans for f16 tensors

ComputeLibrary
ComputeLibrary copied to clipboard