ONE Compiler FE: Support ubuntu 22.04 (jammy)

What

Let's let ONE compiler support ubuntu 22.04.

Why

Ubuntu 22.04 has started to be release. The number of users using ubuntu 22.04 will gradually increase. So, let's prepare to support it in advance! It may a little bit early, but there is nothing wrong with preparing in advance.

Environment of ubuntu 22.04

default version

cmake: 3.22.1
python3: 3.10.4
gcc: 11.2.0
libboost: 1.74

To do

[x] Support cmake 3.22.1
[x] Support python3 3.10.4 (by https://github.com/Samsung/ONE/issues/9432#issuecomment-1185208025)
[ ] Support gcc 11.2.0 (internal only)
[ ] Create jammy docker file (no plan yet)
[ ] Docker build on CI (no plan yet)

Build Target Architectures

Build for x86_64

$ cd {one dir}
$ docker run -it --rm -v `pwd`:`pwd` -w `pwd` ubuntu:22.04 /bin/bash
apt update
apt install cmake libboost-all-dev g++ patch python3-pip python3-venv
python3 -m pip install --upgrade pip

./nncc configure
./nncc build
./nncc test

Build for arm32

$ sudo apt-get install qemu qemu-user-static binfmt-support debootstrap
$ cd {one dir}
$ ROOTFS_DIR=`pwd`/tools/cross/rootfs/arm-jammy sudo -E ./tools/cross/install_rootfs.sh arm jammy --skipunmount

$ cd {one dir}
$ docker run -it --rm -v `pwd`:`pwd` -w `pwd` ubuntu:22.04 /bin/bash
apt update
apt install cmake libboost-all-dev g++ patch python3-pip python3-venv
python3 -m pip install --upgrade pip

apt install g++-arm-linux-gnueabihf
ROOTFS_ARM=`pwd`/tools/cross/rootfs/arm-jammy make -f infra/nncc/Makefile.arm32 cfg
ROOTFS_ARM=`pwd`/tools/cross/rootfs/arm-jammy make -f infra/nncc/Makefile.arm32 debug

Jul 14 '22 12:07 ragmani

Problem of not finding `python` 3.10

https://github.com/Samsung/ONE/pull/9429#issuecomment-1184193971 This problem is not a trouble issue. This problem was limited to an individual environment.

Jul 14 '22 12:07 ragmani

Problem caused by `cmake` policy change when using `find_package(Boost ...)`

error message

CMake Error at /usr/lib/x86_64-linux-gnu/cmake/Boost-1.74.0/BoostConfig.cmake:240 (if):
  if given arguments:

    "ALL" "IN_LIST" "Boost_FIND_COMPONENTS"

  Unknown arguments specified
Call Stack (most recent call first):
  CMakeLists.txt:43 (find_package)
  /home/jang/git/ragmani/ONE/compiler/nnc/backends/soft_backend/CMakeLists.txt:1 (nnas_find_package)

cmake policy

$ cmake --help-policy CMP0057
CMP0057
-------

.. versionadded:: 3.3

Support new ``if()`` IN_LIST operator.

CMake 3.3 adds support for the new IN_LIST operator.

The ``OLD`` behavior for this policy is to ignore the IN_LIST operator.
The ``NEW`` behavior is to interpret the IN_LIST operator.

This policy was introduced in CMake version 3.3.
CMake version 3.22.1 warns when the policy is not set and uses
``OLD`` behavior.  Use the ``cmake_policy()`` command to set
it to ``OLD`` or ``NEW`` explicitly.

.. note::
  The ``OLD`` behavior of a policy is
  ``deprecated by definition``
  and may be removed in a future version of CMake.

solution Add cmake_policy(SET CMP0057 NEW) in macro(nnas_find_package PREFIX) But it requires cmake_minimum_required(VERSION 3.3)

Jul 14 '22 12:07 ragmani

But it requires cmake_minimum_required(VERSION 3.3)

IMO, it's better to update cmake minimum requirement version because cmake 3.1 is old version (Dec 2014: https://cmake.org/pipermail/cmake/2014-December/059418.html).

Jul 15 '22 03:07 hseok-oh

cmake version

Current requirement
- Runtime: 3.5.1
- Compiler: 3.1
Default version on linux
- Ubuntu 16.04: 3.5.1
- Ubuntu 18.04: 3.10.2
- Ubuntu 20.04: 3.16.3
- Ubuntu 22.04: 3.22.1
- Tizen 5.5/6.0: 3.9.4
- Tizen 6.5: 3.16.4
- Tizen 7.0: 3.21.3

Jul 15 '22 03:07 hseok-oh

Problem caused by not finding version 2.6.0 of tensorflow-cpu (pip3.10)

error message

ERROR: Could not find a version that satisfies the requirement tensorflow-cpu==2.6.0 (from versions: 2.8.0, 2.8.1, 2.8.2, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 2.9.0, 2.9.1)
ERROR: No matching distribution found for tensorflow-cpu==2.6.0

I heard from @seanshpark that using tensorflow-cpu 2.6.0 will be removed soon. So, let's wait for it to be removed.

Jul 15 '22 05:07 ragmani

using tensorflow-cpu 2.6.0 will be removed soon

--> #9433 , #9435

Jul 15 '22 06:07 seanshpark

Problem caused by gcc version grade to 11.

Internal sources

#9437

External sources

error message These are the same errors as https://github.com/Samsung/ONE/issues/9265#issue-1271802873

ONE/externals/ABSEIL/absl/synchronization/internal/graphcycles.cc:451:26: error: 'numeric_limits' is not a member of 'std'
  451 |   if (x->version == std::numeric_limits<uint32_t>::max()) {

ONE/externals/ABSEIL/absl/debugging/failure_signal_handler.cc:138:32: error: no matching function for call to 'max(long int, int)'
  138 |   size_t stack_size = (std::max(SIGSTKSZ, 65536) + page_mask) & ~page_mask;

Jul 15 '22 07:07 ragmani

I found out an error that some onecc modules could not be found when cross-buliding onecc on my machine. The patch below solves this error.

@@ -20,38 +20,38 @@ ARM32_INSTALL_FOLDER=$(CURRENT_DIR)/$(BUILDFOLDER)/$(ARM32_FOLDER).$(TYPE_FOLDER
 ARM32_INSTALL_HOST=$(CURRENT_DIR)/$(BUILDFOLDER)/$(ARM32_FOLDER).$(TYPE_FOLDER).host.install
 
 # ARM32 build
-ARM32_BUILD_ITEMS:=angkor;cwrap;pepper-str;pepper-strcast;pp
-ARM32_BUILD_ITEMS+=;pepper-csv2vec;crew
-ARM32_BUILD_ITEMS+=;oops;pepper-assert
-ARM32_BUILD_ITEMS+=;hermes;hermes-std
-ARM32_BUILD_ITEMS+=;loco;locop;logo-core;logo
-ARM32_BUILD_ITEMS+=;safemain;mio-circle04;mio-tflite280
-ARM32_BUILD_ITEMS+=;dio-hdf5
-ARM32_BUILD_ITEMS+=;foder;circle-verify;souschef;arser;vconone
-ARM32_BUILD_ITEMS+=;luci
-ARM32_BUILD_ITEMS+=;luci-interpreter
-ARM32_BUILD_ITEMS+=;tflite2circle
-ARM32_BUILD_ITEMS+=;tflchef;circlechef
-ARM32_BUILD_ITEMS+=;circle2circle;record-minmax;circle-quantizer
-ARM32_BUILD_ITEMS+=;luci-eval-driver;luci-value-test
+ARM32_BUILD_ITEMS:=angkor;cwrap;pepper-str;pepper-strcast;pp;
+ARM32_BUILD_ITEMS+=;pepper-csv2vec;crew;
+ARM32_BUILD_ITEMS+=;oops;pepper-assert;
+ARM32_BUILD_ITEMS+=;hermes;hermes-std;
+ARM32_BUILD_ITEMS+=;loco;locop;logo-core;logo;
+ARM32_BUILD_ITEMS+=;safemain;mio-tflite280;mio-circle04;
+ARM32_BUILD_ITEMS+=;dio-hdf5;
+ARM32_BUILD_ITEMS+=;foder;circle-verify;souschef;arser;vconone;
+ARM32_BUILD_ITEMS+=;luci;
+ARM32_BUILD_ITEMS+=;luci-interpreter;
+ARM32_BUILD_ITEMS+=;tflite2circle;
+ARM32_BUILD_ITEMS+=;tflchef;circlechef;
+ARM32_BUILD_ITEMS+=;circle2circle;record-minmax;circle-quantizer;
+ARM32_BUILD_ITEMS+=;luci-eval-driver;luci-value-test;
 
 ARM32_TOOLCHAIN_FILE=cmake/buildtool/cross/toolchain_armv7l-linux.cmake
 
-ARM32_HOST_ITEMS:=angkor;cwrap;pepper-str;pepper-strcast;pp
-ARM32_HOST_ITEMS+=;pepper-csv2vec
-ARM32_HOST_ITEMS+=;oops
-ARM32_HOST_ITEMS+=;hermes;hermes-std
-ARM32_HOST_ITEMS+=;loco;locop;logo-core;logo
-ARM32_HOST_ITEMS+=;safemain;mio-circle04;mio-tflite280
-ARM32_HOST_ITEMS+=;foder;circle-verify;souschef;arser;vconone
-ARM32_HOST_ITEMS+=;luci
-ARM32_HOST_ITEMS+=;luci-interpreter
-ARM32_HOST_ITEMS+=;tflite2circle
-ARM32_HOST_ITEMS+=;tflchef;circlechef
-ARM32_HOST_ITEMS+=;circle-tensordump
-ARM32_HOST_ITEMS+=;circle2circle
-ARM32_HOST_ITEMS+=;common-artifacts
-ARM32_HOST_ITEMS+=;luci-eval-driver;luci-value-test
+ARM32_HOST_ITEMS:=angkor;cwrap;pepper-str;pepper-strcast;pp;
+ARM32_HOST_ITEMS+=;pepper-csv2vec;
+ARM32_HOST_ITEMS+=;oops;
+ARM32_HOST_ITEMS+=;hermes;hermes-std;
+ARM32_HOST_ITEMS+=;loco;locop;logo-core;logo;
+ARM32_HOST_ITEMS+=;safemain;mio-tflite280;mio-circle04;
+ARM32_HOST_ITEMS+=;foder;circle-verify;souschef;arser;vconone;
+ARM32_HOST_ITEMS+=;luci;
+ARM32_HOST_ITEMS+=;luci-interpreter;
+ARM32_HOST_ITEMS+=;tflite2circle;
+ARM32_HOST_ITEMS+=;tflchef;circlechef;
+ARM32_HOST_ITEMS+=;circle-tensordump;
+ARM32_HOST_ITEMS+=;circle2circle;
+ARM32_HOST_ITEMS+=;common-artifacts;
+ARM32_HOST_ITEMS+=;luci-eval-driver;luci-value-test;
 
 
 _SPACE_:=

But I'm not sure if this way is correct.

Jul 27 '22 07:07 ragmani

I found an error when cross-building. It's hard for me to solve it.

error message

[ 93%] Building CXX object compiler/luci-interpreter/src/kernels/CMakeFiles/luci_interpreter_linux_pal.dir/home/jang/git/ragmani/ONE/externals/TENSORFLOW-2.6.0-RUY/ruy/pack_arm.cc.o
/home/jang/git/ragmani/ONE/externals/TENSORFLOW-2.6.0-RUY/ruy/pack_arm.cc: In function 'void ruy::Pack8bitColMajorForNeon4Cols(const ruy::PackParams8bit&)':
/home/jang/git/ragmani/ONE/externals/TENSORFLOW-2.6.0-RUY/ruy/pack_arm.cc:264:3: error: 'asm' operand has impossible constraints
  264 |   asm volatile(
      |   ^~~
gmake[3]: *** [compiler/luci-interpreter/src/kernels/CMakeFiles/luci_interpreter_linux_pal.dir/build.make:258: compiler/luci-interpreter/src/kernels/CMakeFiles/luci_interpreter_linux_pal.dir/home/jang/git/ragmani/ONE/externals/TENSORFLOW-2.6.0-RUY/ruy/pack_arm.cc.o] Error 1

source code

// No attempt made at making this code efficient on A55-ish cores yet.
void Pack8bitColMajorForNeon4Cols(const PackParams8bit& params) {
  CheckOffsetsInPackParams8bit(params);
  profiler::ScopeLabel label("Pack (kNeon)");
  const void* src_ptr0 = params.src_ptr0;
  const void* src_ptr1 = params.src_ptr1;
  const void* src_ptr2 = params.src_ptr2;
  const void* src_ptr3 = params.src_ptr3;
  const int src_inc0 = params.src_inc0;
  const int src_inc1 = params.src_inc1;
  const int src_inc2 = params.src_inc2;
  const int src_inc3 = params.src_inc3;
  const std::int8_t* packed_ptr = params.packed_ptr;

  asm volatile(                                                 <---------- line 264
      // clang-format off

          "ldr r2, [%[params], #" RUY_STR(RUY_OFFSET_INPUT_XOR) "]\n"
          "vdup.8 q11, r2\n"
          "mov r1, #0\n"
          // Zero-out the accumulators
          "vmov.i32 q12, #0\n"
          "vmov.i32 q13, #0\n"
          "vmov.i32 q14, #0\n"
          "vmov.i32 q15, #0\n"

          // Round down src_rows to nearest multiple of 16.
          "ldr r3, [%[params], #" RUY_STR(RUY_OFFSET_SRC_ROWS) "]\n"
          "and r2, r3, #-16\n"
          "cmp r1, r2\n"
          "beq 3f\n"

          "1:\n"
          "add r1, r1, #16\n"
          /* Load q0 */
          "vld1.8 {d0, d1}, [%[src_ptr0]]\n"
          "add %[src_ptr0], %[src_ptr0], %[src_inc0]\n"
          RUY_PREFETCH_LOAD("pld [%[src_ptr0]]\n")

          /* Load q1 */
          "vld1.8 {d2, d3}, [%[src_ptr1]]\n"
          "add %[src_ptr1], %[src_ptr1], %[src_inc1]\n"
          RUY_PREFETCH_LOAD("pld [%[src_ptr1]]\n")

          "veor.8 q4, q0, q11\n"
          "veor.8 q5, q1, q11\n"

          // Pairwise add in to 16b accumulators.
          "vpaddl.s8 q8, q4\n"
          "vpaddl.s8 q9, q5\n"

          "vst1.32 {q4}, [%[packed_ptr]]!\n"
          "vst1.32 {q5}, [%[packed_ptr]]!\n"

          // Pairwise add in to 16b accumulators.
          "vpaddl.s8 q8, q4\n"
          "vpaddl.s8 q9, q5\n"

          "vst1.32 {q4}, [%[packed_ptr]]!\n"
          "vst1.32 {q5}, [%[packed_ptr]]!\n"

          // Pairwise add accumulate into 32b accumulators.
          // q12 and q13 contain 4x32b accumulators
          "vpadal.s16 q12, q8\n"
          "vpadal.s16 q13, q9\n"

          // Now do the same for src_ptr2 and src_ptr3.
          "vld1.8 {d0, d1}, [%[src_ptr2]]\n"
          "add %[src_ptr2], %[src_ptr2], %[src_inc2]\n"
          RUY_PREFETCH_LOAD("pld [%[src_ptr2]]\n")

          "vld1.8 {d2, d3}, [%[src_ptr3]]\n"
          "add %[src_ptr3], %[src_ptr3], %[src_inc3]\n"
          RUY_PREFETCH_LOAD("pld [%[src_ptr3]]\n")

          "veor.8 q4, q0, q11\n"
          "veor.8 q5, q1, q11\n"

          "vpaddl.s8 q8, q4\n"
          "vpaddl.s8 q9, q5\n"

          "vst1.32 {q4}, [%[packed_ptr]]!\n"
          "vst1.32 {q5}, [%[packed_ptr]]!\n"

          // Pairwise add accumulate into 32b accumulators.
          // q14 and q15 contain 4x32b accumulators
          "vpadal.s16 q14, q8\n"
          "vpadal.s16 q15, q9\n"

          "cmp r1, r2\n"
          "bne 1b\n"

          "3:\n"

          // Now pack the last (num_rows % 16) rows.
          "ldr r3, [%[params], #" RUY_STR(RUY_OFFSET_SRC_ROWS) "]\n"
          "ands r2, r3, #15\n"
          "beq 4f\n"
          "ldr r3, [%[params], #" RUY_STR(RUY_OFFSET_SRC_ZERO_POINT) "]\n"
          "vdup.8 q0, r3\n"
          "vdup.8 q1, r3\n"

// First, read/accumulate/write for src_ptr0 and src_ptr1.
#define RUY_LOAD_ONE_ROW1(I, R)            \
  "cmp r2, #" #I "\n"                      \
  "beq 5f\n"                               \
  "vld1.8 { d0[" #R "]}, [%[src_ptr0]]!\n" \
  "vld1.8 { d2[" #R "]}, [%[src_ptr1]]!\n" \

          RUY_LOAD_ONE_ROW1(0, 0)
          RUY_LOAD_ONE_ROW1(1, 1)
          RUY_LOAD_ONE_ROW1(2, 2)
          RUY_LOAD_ONE_ROW1(3, 3)
          RUY_LOAD_ONE_ROW1(4, 4)
          RUY_LOAD_ONE_ROW1(5, 5)
          RUY_LOAD_ONE_ROW1(6, 6)
          RUY_LOAD_ONE_ROW1(7, 7)
#undef RUY_LOAD_ONE_ROW1

#define RUY_LOAD_ONE_ROW2(I, R)            \
  "cmp r2, #" #I "\n"                      \
  "beq 5f\n"                               \
  "vld1.8 { d1[" #R "]}, [%[src_ptr0]]!\n" \
  "vld1.8 { d3[" #R "]}, [%[src_ptr1]]!\n" \

          RUY_LOAD_ONE_ROW2(8, 0)
          RUY_LOAD_ONE_ROW2(9, 1)
          RUY_LOAD_ONE_ROW2(10, 2)
          RUY_LOAD_ONE_ROW2(11, 3)
          RUY_LOAD_ONE_ROW2(12, 4)
          RUY_LOAD_ONE_ROW2(13, 5)
          RUY_LOAD_ONE_ROW2(14, 6)
          RUY_LOAD_ONE_ROW2(15, 7)
#undef RUY_LOAD_ONE_ROW2

          "5:\n"

          "veor.16 q4, q0, q11\n"
          "veor.16 q5, q1, q11\n"

          "vpaddl.s8 q8, q4\n"
          "vpaddl.s8 q9, q5\n"

          // Pairwise add accumulate to 4x32b accumulators.
          "vpadal.s16 q12, q8\n"
          "vpadal.s16 q13, q9\n"

          "vst1.32 {q4}, [%[packed_ptr]]!\n"
          "vst1.32 {q5}, [%[packed_ptr]]!\n"

          // Reset to src_zero for src_ptr2 and src_ptr3.
          "vdup.8 q0, r3\n"
          "vdup.8 q1, r3\n"

// Next, read/accumulate/write for src_ptr2 and src_ptr3.
#define RUY_LOAD_ONE_ROW1(I, R)            \
  "cmp r2, #" #I "\n"                      \
  "beq 5f\n"                               \
  "vld1.8 { d0[" #R "]}, [%[src_ptr2]]!\n" \
  "vld1.8 { d2[" #R "]}, [%[src_ptr3]]!\n" \

          RUY_LOAD_ONE_ROW1(0, 0)
          RUY_LOAD_ONE_ROW1(1, 1)
          RUY_LOAD_ONE_ROW1(2, 2)
          RUY_LOAD_ONE_ROW1(3, 3)
          RUY_LOAD_ONE_ROW1(4, 4)
          RUY_LOAD_ONE_ROW1(5, 5)
          RUY_LOAD_ONE_ROW1(6, 6)
          RUY_LOAD_ONE_ROW1(7, 7)
#undef RUY_LOAD_ONE_ROW1

#define RUY_LOAD_ONE_ROW2(I, R)            \
  "cmp r2, #" #I "\n"                      \
  "beq 5f\n"                               \
  "vld1.8 { d1[" #R "]}, [%[src_ptr2]]!\n" \
  "vld1.8 { d3[" #R "]}, [%[src_ptr3]]!\n" \

          RUY_LOAD_ONE_ROW2(8, 0)
          RUY_LOAD_ONE_ROW2(9, 1)
          RUY_LOAD_ONE_ROW2(10, 2)
          RUY_LOAD_ONE_ROW2(11, 3)
          RUY_LOAD_ONE_ROW2(12, 4)
          RUY_LOAD_ONE_ROW2(13, 5)
          RUY_LOAD_ONE_ROW2(14, 6)
          RUY_LOAD_ONE_ROW2(15, 7)
#undef RUY_LOAD_ONE_ROW2

          "5:\n"

          "veor.16 q4, q0, q11\n"
          "veor.16 q5, q1, q11\n"

          "vpaddl.s8 q8, q4\n"
          "vpaddl.s8 q9, q5\n"

          // Pairwise add accumulate to 4x32b accumulators.
          "vpadal.s16 q14, q8\n"
          "vpadal.s16 q15, q9\n"

          "vst1.32 {q4}, [%[packed_ptr]]!\n"
          "vst1.32 {q5}, [%[packed_ptr]]!\n"

          "4:\n"
          // Pairwise add 32-bit accumulators
          "vpadd.i32 d24, d24, d25\n"
          "vpadd.i32 d26, d26, d27\n"
          "vpadd.i32 d28, d28, d29\n"
          "vpadd.i32 d30, d30, d31\n"
          // Final 32-bit values per row
          "vpadd.i32 d25, d24, d26\n"
          "vpadd.i32 d27, d28, d30\n"

          "ldr r3, [%[params], #" RUY_STR(RUY_OFFSET_SUMS_PTR) "]\n"
          "cmp r3, #0\n"
          "beq 6f\n"
          "vst1.32 {d25}, [r3]!\n"
          "vst1.32 {d27}, [r3]!\n"
          "6:\n"
      // clang-format on

      : [ src_ptr0 ] "+r"(src_ptr0), [ src_ptr1 ] "+r"(src_ptr1),
        [ src_ptr2 ] "+r"(src_ptr2), [ src_ptr3 ] "+r"(src_ptr3)
      : [ src_inc0 ] "r"(src_inc0), [ src_inc1 ] "r"(src_inc1),
        [ src_inc2 ] "r"(src_inc2), [ src_inc3 ] "r"(src_inc3),
        [ packed_ptr ] "r"(packed_ptr), [ params ] "r"(&params)
      : "cc", "memory", "r1", "r2", "r3", "q0", "q1", "q2", "q3",
        "q4", "q5", "q6", "q7", "q8", "q9", "q10", "q11", "q12", "q13");
}

Jul 27 '22 07:07 ragmani

But I'm not sure if this way is correct.

@ragmani , I can't distinghish by the diff, what module has changed?

Jul 27 '22 21:07 seanshpark

@ragmani , I can't distinghish by the diff, what module has changed?

I haven't changed any modules. I just added ; at the end of each line.

Jul 28 '22 03:07 ragmani

@ragmani , I can't distinghish by the diff, what module has changed?

I haven't changed any modules. I just added ; at the end of each line.

FYI, AFAIR, when trying to enable tizen build, the target without ';' is not added into build target.

Jul 28 '22 12:07 chunseoklee

FYI, AFAIR, when trying to enable tizen build, the target without ';' is not added into build target.

Sorry. I didn't understand what you said. What are "target" and "build target" you mentioned?

Jul 29 '22 00:07 ragmani

@jinevening You can find discussion about cmake version under https://github.com/Samsung/ONE/issues/9432#issuecomment-1184412692

Sep 15 '22 05:09 hseok-oh

#9916 is a way to fix the error commented at https://github.com/Samsung/ONE/issues/9432#issuecomment-1196362702

Oct 24 '22 09:10 ragmani

ONE ONE copied to clipboard

Compiler FE: Support ubuntu 22.04 (jammy)

What

Why

Environment of ubuntu 22.04

default version

To do

Build Target Architectures

Build for x86_64

Build for arm32

Problem of not finding python 3.10

Problem caused by cmake policy change when using find_package(Boost ...)

Problem caused by not finding version 2.6.0 of tensorflow-cpu (pip3.10)

Problem caused by gcc version grade to 11.

Internal sources

External sources

ONE
ONE copied to clipboard

Problem of not finding `python` 3.10

Problem caused by `cmake` policy change when using `find_package(Boost ...)`