ONE
ONE copied to clipboard
Compiler FE: Support ubuntu 22.04 (jammy)
What
Let's let ONE compiler support ubuntu 22.04.
Why
Ubuntu 22.04 has started to be release. The number of users using ubuntu 22.04 will gradually increase. So, let's prepare to support it in advance! It may a little bit early, but there is nothing wrong with preparing in advance.
Environment of ubuntu 22.04
default version
- cmake: 3.22.1
- python3: 3.10.4
- gcc: 11.2.0
- libboost: 1.74
To do
- [x] Support cmake 3.22.1
- [x] Support python3 3.10.4 (by https://github.com/Samsung/ONE/issues/9432#issuecomment-1185208025)
- [ ] Support gcc 11.2.0 (internal only)
- [ ] Create jammy docker file (no plan yet)
- [ ] Docker build on CI (no plan yet)
Build Target Architectures
Build for x86_64
$ cd {one dir}
$ docker run -it --rm -v `pwd`:`pwd` -w `pwd` ubuntu:22.04 /bin/bash
apt update
apt install cmake libboost-all-dev g++ patch python3-pip python3-venv
python3 -m pip install --upgrade pip
./nncc configure
./nncc build
./nncc test
Build for arm32
$ sudo apt-get install qemu qemu-user-static binfmt-support debootstrap
$ cd {one dir}
$ ROOTFS_DIR=`pwd`/tools/cross/rootfs/arm-jammy sudo -E ./tools/cross/install_rootfs.sh arm jammy --skipunmount
$ cd {one dir}
$ docker run -it --rm -v `pwd`:`pwd` -w `pwd` ubuntu:22.04 /bin/bash
apt update
apt install cmake libboost-all-dev g++ patch python3-pip python3-venv
python3 -m pip install --upgrade pip
apt install g++-arm-linux-gnueabihf
ROOTFS_ARM=`pwd`/tools/cross/rootfs/arm-jammy make -f infra/nncc/Makefile.arm32 cfg
ROOTFS_ARM=`pwd`/tools/cross/rootfs/arm-jammy make -f infra/nncc/Makefile.arm32 debug
Problem of not finding python
3.10
https://github.com/Samsung/ONE/pull/9429#issuecomment-1184193971 This problem is not a trouble issue. This problem was limited to an individual environment.
Problem caused by cmake
policy change when using find_package(Boost ...)
- error message
CMake Error at /usr/lib/x86_64-linux-gnu/cmake/Boost-1.74.0/BoostConfig.cmake:240 (if):
if given arguments:
"ALL" "IN_LIST" "Boost_FIND_COMPONENTS"
Unknown arguments specified
Call Stack (most recent call first):
CMakeLists.txt:43 (find_package)
/home/jang/git/ragmani/ONE/compiler/nnc/backends/soft_backend/CMakeLists.txt:1 (nnas_find_package)
- cmake policy
$ cmake --help-policy CMP0057
CMP0057
-------
.. versionadded:: 3.3
Support new ``if()`` IN_LIST operator.
CMake 3.3 adds support for the new IN_LIST operator.
The ``OLD`` behavior for this policy is to ignore the IN_LIST operator.
The ``NEW`` behavior is to interpret the IN_LIST operator.
This policy was introduced in CMake version 3.3.
CMake version 3.22.1 warns when the policy is not set and uses
``OLD`` behavior. Use the ``cmake_policy()`` command to set
it to ``OLD`` or ``NEW`` explicitly.
.. note::
The ``OLD`` behavior of a policy is
``deprecated by definition``
and may be removed in a future version of CMake.
- solution
Add
cmake_policy(SET CMP0057 NEW)
inmacro(nnas_find_package PREFIX)
But it requirescmake_minimum_required(VERSION 3.3)
But it requires
cmake_minimum_required(VERSION 3.3)
IMO, it's better to update cmake minimum requirement version because cmake 3.1 is old version (Dec 2014: https://cmake.org/pipermail/cmake/2014-December/059418.html).
cmake version
- Current requirement
- Runtime:
3.5.1
- Compiler:
3.1
- Runtime:
- Default version on linux
- Ubuntu 16.04:
3.5.1
- Ubuntu 18.04:
3.10.2
- Ubuntu 20.04:
3.16.3
- Ubuntu 22.04:
3.22.1
- Tizen 5.5/6.0:
3.9.4
- Tizen 6.5:
3.16.4
- Tizen 7.0:
3.21.3
- Ubuntu 16.04:
Problem caused by not finding version 2.6.0 of tensorflow-cpu (pip3.10)
- error message
ERROR: Could not find a version that satisfies the requirement tensorflow-cpu==2.6.0 (from versions: 2.8.0, 2.8.1, 2.8.2, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 2.9.0, 2.9.1)
ERROR: No matching distribution found for tensorflow-cpu==2.6.0
I heard from @seanshpark that using tensorflow-cpu 2.6.0 will be removed soon. So, let's wait for it to be removed.
using tensorflow-cpu 2.6.0 will be removed soon
--> #9433 , #9435
Problem caused by gcc version grade to 11.
Internal sources
#9437
External sources
- error message These are the same errors as https://github.com/Samsung/ONE/issues/9265#issue-1271802873
ONE/externals/ABSEIL/absl/synchronization/internal/graphcycles.cc:451:26: error: 'numeric_limits' is not a member of 'std'
451 | if (x->version == std::numeric_limits<uint32_t>::max()) {
ONE/externals/ABSEIL/absl/debugging/failure_signal_handler.cc:138:32: error: no matching function for call to 'max(long int, int)'
138 | size_t stack_size = (std::max(SIGSTKSZ, 65536) + page_mask) & ~page_mask;
I found out an error that some onecc
modules could not be found when cross-buliding onecc
on my machine.
The patch below solves this error.
@@ -20,38 +20,38 @@ ARM32_INSTALL_FOLDER=$(CURRENT_DIR)/$(BUILDFOLDER)/$(ARM32_FOLDER).$(TYPE_FOLDER
ARM32_INSTALL_HOST=$(CURRENT_DIR)/$(BUILDFOLDER)/$(ARM32_FOLDER).$(TYPE_FOLDER).host.install
# ARM32 build
-ARM32_BUILD_ITEMS:=angkor;cwrap;pepper-str;pepper-strcast;pp
-ARM32_BUILD_ITEMS+=;pepper-csv2vec;crew
-ARM32_BUILD_ITEMS+=;oops;pepper-assert
-ARM32_BUILD_ITEMS+=;hermes;hermes-std
-ARM32_BUILD_ITEMS+=;loco;locop;logo-core;logo
-ARM32_BUILD_ITEMS+=;safemain;mio-circle04;mio-tflite280
-ARM32_BUILD_ITEMS+=;dio-hdf5
-ARM32_BUILD_ITEMS+=;foder;circle-verify;souschef;arser;vconone
-ARM32_BUILD_ITEMS+=;luci
-ARM32_BUILD_ITEMS+=;luci-interpreter
-ARM32_BUILD_ITEMS+=;tflite2circle
-ARM32_BUILD_ITEMS+=;tflchef;circlechef
-ARM32_BUILD_ITEMS+=;circle2circle;record-minmax;circle-quantizer
-ARM32_BUILD_ITEMS+=;luci-eval-driver;luci-value-test
+ARM32_BUILD_ITEMS:=angkor;cwrap;pepper-str;pepper-strcast;pp;
+ARM32_BUILD_ITEMS+=;pepper-csv2vec;crew;
+ARM32_BUILD_ITEMS+=;oops;pepper-assert;
+ARM32_BUILD_ITEMS+=;hermes;hermes-std;
+ARM32_BUILD_ITEMS+=;loco;locop;logo-core;logo;
+ARM32_BUILD_ITEMS+=;safemain;mio-tflite280;mio-circle04;
+ARM32_BUILD_ITEMS+=;dio-hdf5;
+ARM32_BUILD_ITEMS+=;foder;circle-verify;souschef;arser;vconone;
+ARM32_BUILD_ITEMS+=;luci;
+ARM32_BUILD_ITEMS+=;luci-interpreter;
+ARM32_BUILD_ITEMS+=;tflite2circle;
+ARM32_BUILD_ITEMS+=;tflchef;circlechef;
+ARM32_BUILD_ITEMS+=;circle2circle;record-minmax;circle-quantizer;
+ARM32_BUILD_ITEMS+=;luci-eval-driver;luci-value-test;
ARM32_TOOLCHAIN_FILE=cmake/buildtool/cross/toolchain_armv7l-linux.cmake
-ARM32_HOST_ITEMS:=angkor;cwrap;pepper-str;pepper-strcast;pp
-ARM32_HOST_ITEMS+=;pepper-csv2vec
-ARM32_HOST_ITEMS+=;oops
-ARM32_HOST_ITEMS+=;hermes;hermes-std
-ARM32_HOST_ITEMS+=;loco;locop;logo-core;logo
-ARM32_HOST_ITEMS+=;safemain;mio-circle04;mio-tflite280
-ARM32_HOST_ITEMS+=;foder;circle-verify;souschef;arser;vconone
-ARM32_HOST_ITEMS+=;luci
-ARM32_HOST_ITEMS+=;luci-interpreter
-ARM32_HOST_ITEMS+=;tflite2circle
-ARM32_HOST_ITEMS+=;tflchef;circlechef
-ARM32_HOST_ITEMS+=;circle-tensordump
-ARM32_HOST_ITEMS+=;circle2circle
-ARM32_HOST_ITEMS+=;common-artifacts
-ARM32_HOST_ITEMS+=;luci-eval-driver;luci-value-test
+ARM32_HOST_ITEMS:=angkor;cwrap;pepper-str;pepper-strcast;pp;
+ARM32_HOST_ITEMS+=;pepper-csv2vec;
+ARM32_HOST_ITEMS+=;oops;
+ARM32_HOST_ITEMS+=;hermes;hermes-std;
+ARM32_HOST_ITEMS+=;loco;locop;logo-core;logo;
+ARM32_HOST_ITEMS+=;safemain;mio-tflite280;mio-circle04;
+ARM32_HOST_ITEMS+=;foder;circle-verify;souschef;arser;vconone;
+ARM32_HOST_ITEMS+=;luci;
+ARM32_HOST_ITEMS+=;luci-interpreter;
+ARM32_HOST_ITEMS+=;tflite2circle;
+ARM32_HOST_ITEMS+=;tflchef;circlechef;
+ARM32_HOST_ITEMS+=;circle-tensordump;
+ARM32_HOST_ITEMS+=;circle2circle;
+ARM32_HOST_ITEMS+=;common-artifacts;
+ARM32_HOST_ITEMS+=;luci-eval-driver;luci-value-test;
_SPACE_:=
But I'm not sure if this way is correct.
I found an error when cross-building. It's hard for me to solve it.
- error message
[ 93%] Building CXX object compiler/luci-interpreter/src/kernels/CMakeFiles/luci_interpreter_linux_pal.dir/home/jang/git/ragmani/ONE/externals/TENSORFLOW-2.6.0-RUY/ruy/pack_arm.cc.o
/home/jang/git/ragmani/ONE/externals/TENSORFLOW-2.6.0-RUY/ruy/pack_arm.cc: In function 'void ruy::Pack8bitColMajorForNeon4Cols(const ruy::PackParams8bit&)':
/home/jang/git/ragmani/ONE/externals/TENSORFLOW-2.6.0-RUY/ruy/pack_arm.cc:264:3: error: 'asm' operand has impossible constraints
264 | asm volatile(
| ^~~
gmake[3]: *** [compiler/luci-interpreter/src/kernels/CMakeFiles/luci_interpreter_linux_pal.dir/build.make:258: compiler/luci-interpreter/src/kernels/CMakeFiles/luci_interpreter_linux_pal.dir/home/jang/git/ragmani/ONE/externals/TENSORFLOW-2.6.0-RUY/ruy/pack_arm.cc.o] Error 1
- source code
// No attempt made at making this code efficient on A55-ish cores yet.
void Pack8bitColMajorForNeon4Cols(const PackParams8bit& params) {
CheckOffsetsInPackParams8bit(params);
profiler::ScopeLabel label("Pack (kNeon)");
const void* src_ptr0 = params.src_ptr0;
const void* src_ptr1 = params.src_ptr1;
const void* src_ptr2 = params.src_ptr2;
const void* src_ptr3 = params.src_ptr3;
const int src_inc0 = params.src_inc0;
const int src_inc1 = params.src_inc1;
const int src_inc2 = params.src_inc2;
const int src_inc3 = params.src_inc3;
const std::int8_t* packed_ptr = params.packed_ptr;
asm volatile( <---------- line 264
// clang-format off
"ldr r2, [%[params], #" RUY_STR(RUY_OFFSET_INPUT_XOR) "]\n"
"vdup.8 q11, r2\n"
"mov r1, #0\n"
// Zero-out the accumulators
"vmov.i32 q12, #0\n"
"vmov.i32 q13, #0\n"
"vmov.i32 q14, #0\n"
"vmov.i32 q15, #0\n"
// Round down src_rows to nearest multiple of 16.
"ldr r3, [%[params], #" RUY_STR(RUY_OFFSET_SRC_ROWS) "]\n"
"and r2, r3, #-16\n"
"cmp r1, r2\n"
"beq 3f\n"
"1:\n"
"add r1, r1, #16\n"
/* Load q0 */
"vld1.8 {d0, d1}, [%[src_ptr0]]\n"
"add %[src_ptr0], %[src_ptr0], %[src_inc0]\n"
RUY_PREFETCH_LOAD("pld [%[src_ptr0]]\n")
/* Load q1 */
"vld1.8 {d2, d3}, [%[src_ptr1]]\n"
"add %[src_ptr1], %[src_ptr1], %[src_inc1]\n"
RUY_PREFETCH_LOAD("pld [%[src_ptr1]]\n")
"veor.8 q4, q0, q11\n"
"veor.8 q5, q1, q11\n"
// Pairwise add in to 16b accumulators.
"vpaddl.s8 q8, q4\n"
"vpaddl.s8 q9, q5\n"
"vst1.32 {q4}, [%[packed_ptr]]!\n"
"vst1.32 {q5}, [%[packed_ptr]]!\n"
// Pairwise add in to 16b accumulators.
"vpaddl.s8 q8, q4\n"
"vpaddl.s8 q9, q5\n"
"vst1.32 {q4}, [%[packed_ptr]]!\n"
"vst1.32 {q5}, [%[packed_ptr]]!\n"
// Pairwise add accumulate into 32b accumulators.
// q12 and q13 contain 4x32b accumulators
"vpadal.s16 q12, q8\n"
"vpadal.s16 q13, q9\n"
// Now do the same for src_ptr2 and src_ptr3.
"vld1.8 {d0, d1}, [%[src_ptr2]]\n"
"add %[src_ptr2], %[src_ptr2], %[src_inc2]\n"
RUY_PREFETCH_LOAD("pld [%[src_ptr2]]\n")
"vld1.8 {d2, d3}, [%[src_ptr3]]\n"
"add %[src_ptr3], %[src_ptr3], %[src_inc3]\n"
RUY_PREFETCH_LOAD("pld [%[src_ptr3]]\n")
"veor.8 q4, q0, q11\n"
"veor.8 q5, q1, q11\n"
"vpaddl.s8 q8, q4\n"
"vpaddl.s8 q9, q5\n"
"vst1.32 {q4}, [%[packed_ptr]]!\n"
"vst1.32 {q5}, [%[packed_ptr]]!\n"
// Pairwise add accumulate into 32b accumulators.
// q14 and q15 contain 4x32b accumulators
"vpadal.s16 q14, q8\n"
"vpadal.s16 q15, q9\n"
"cmp r1, r2\n"
"bne 1b\n"
"3:\n"
// Now pack the last (num_rows % 16) rows.
"ldr r3, [%[params], #" RUY_STR(RUY_OFFSET_SRC_ROWS) "]\n"
"ands r2, r3, #15\n"
"beq 4f\n"
"ldr r3, [%[params], #" RUY_STR(RUY_OFFSET_SRC_ZERO_POINT) "]\n"
"vdup.8 q0, r3\n"
"vdup.8 q1, r3\n"
// First, read/accumulate/write for src_ptr0 and src_ptr1.
#define RUY_LOAD_ONE_ROW1(I, R) \
"cmp r2, #" #I "\n" \
"beq 5f\n" \
"vld1.8 { d0[" #R "]}, [%[src_ptr0]]!\n" \
"vld1.8 { d2[" #R "]}, [%[src_ptr1]]!\n" \
RUY_LOAD_ONE_ROW1(0, 0)
RUY_LOAD_ONE_ROW1(1, 1)
RUY_LOAD_ONE_ROW1(2, 2)
RUY_LOAD_ONE_ROW1(3, 3)
RUY_LOAD_ONE_ROW1(4, 4)
RUY_LOAD_ONE_ROW1(5, 5)
RUY_LOAD_ONE_ROW1(6, 6)
RUY_LOAD_ONE_ROW1(7, 7)
#undef RUY_LOAD_ONE_ROW1
#define RUY_LOAD_ONE_ROW2(I, R) \
"cmp r2, #" #I "\n" \
"beq 5f\n" \
"vld1.8 { d1[" #R "]}, [%[src_ptr0]]!\n" \
"vld1.8 { d3[" #R "]}, [%[src_ptr1]]!\n" \
RUY_LOAD_ONE_ROW2(8, 0)
RUY_LOAD_ONE_ROW2(9, 1)
RUY_LOAD_ONE_ROW2(10, 2)
RUY_LOAD_ONE_ROW2(11, 3)
RUY_LOAD_ONE_ROW2(12, 4)
RUY_LOAD_ONE_ROW2(13, 5)
RUY_LOAD_ONE_ROW2(14, 6)
RUY_LOAD_ONE_ROW2(15, 7)
#undef RUY_LOAD_ONE_ROW2
"5:\n"
"veor.16 q4, q0, q11\n"
"veor.16 q5, q1, q11\n"
"vpaddl.s8 q8, q4\n"
"vpaddl.s8 q9, q5\n"
// Pairwise add accumulate to 4x32b accumulators.
"vpadal.s16 q12, q8\n"
"vpadal.s16 q13, q9\n"
"vst1.32 {q4}, [%[packed_ptr]]!\n"
"vst1.32 {q5}, [%[packed_ptr]]!\n"
// Reset to src_zero for src_ptr2 and src_ptr3.
"vdup.8 q0, r3\n"
"vdup.8 q1, r3\n"
// Next, read/accumulate/write for src_ptr2 and src_ptr3.
#define RUY_LOAD_ONE_ROW1(I, R) \
"cmp r2, #" #I "\n" \
"beq 5f\n" \
"vld1.8 { d0[" #R "]}, [%[src_ptr2]]!\n" \
"vld1.8 { d2[" #R "]}, [%[src_ptr3]]!\n" \
RUY_LOAD_ONE_ROW1(0, 0)
RUY_LOAD_ONE_ROW1(1, 1)
RUY_LOAD_ONE_ROW1(2, 2)
RUY_LOAD_ONE_ROW1(3, 3)
RUY_LOAD_ONE_ROW1(4, 4)
RUY_LOAD_ONE_ROW1(5, 5)
RUY_LOAD_ONE_ROW1(6, 6)
RUY_LOAD_ONE_ROW1(7, 7)
#undef RUY_LOAD_ONE_ROW1
#define RUY_LOAD_ONE_ROW2(I, R) \
"cmp r2, #" #I "\n" \
"beq 5f\n" \
"vld1.8 { d1[" #R "]}, [%[src_ptr2]]!\n" \
"vld1.8 { d3[" #R "]}, [%[src_ptr3]]!\n" \
RUY_LOAD_ONE_ROW2(8, 0)
RUY_LOAD_ONE_ROW2(9, 1)
RUY_LOAD_ONE_ROW2(10, 2)
RUY_LOAD_ONE_ROW2(11, 3)
RUY_LOAD_ONE_ROW2(12, 4)
RUY_LOAD_ONE_ROW2(13, 5)
RUY_LOAD_ONE_ROW2(14, 6)
RUY_LOAD_ONE_ROW2(15, 7)
#undef RUY_LOAD_ONE_ROW2
"5:\n"
"veor.16 q4, q0, q11\n"
"veor.16 q5, q1, q11\n"
"vpaddl.s8 q8, q4\n"
"vpaddl.s8 q9, q5\n"
// Pairwise add accumulate to 4x32b accumulators.
"vpadal.s16 q14, q8\n"
"vpadal.s16 q15, q9\n"
"vst1.32 {q4}, [%[packed_ptr]]!\n"
"vst1.32 {q5}, [%[packed_ptr]]!\n"
"4:\n"
// Pairwise add 32-bit accumulators
"vpadd.i32 d24, d24, d25\n"
"vpadd.i32 d26, d26, d27\n"
"vpadd.i32 d28, d28, d29\n"
"vpadd.i32 d30, d30, d31\n"
// Final 32-bit values per row
"vpadd.i32 d25, d24, d26\n"
"vpadd.i32 d27, d28, d30\n"
"ldr r3, [%[params], #" RUY_STR(RUY_OFFSET_SUMS_PTR) "]\n"
"cmp r3, #0\n"
"beq 6f\n"
"vst1.32 {d25}, [r3]!\n"
"vst1.32 {d27}, [r3]!\n"
"6:\n"
// clang-format on
: [ src_ptr0 ] "+r"(src_ptr0), [ src_ptr1 ] "+r"(src_ptr1),
[ src_ptr2 ] "+r"(src_ptr2), [ src_ptr3 ] "+r"(src_ptr3)
: [ src_inc0 ] "r"(src_inc0), [ src_inc1 ] "r"(src_inc1),
[ src_inc2 ] "r"(src_inc2), [ src_inc3 ] "r"(src_inc3),
[ packed_ptr ] "r"(packed_ptr), [ params ] "r"(¶ms)
: "cc", "memory", "r1", "r2", "r3", "q0", "q1", "q2", "q3",
"q4", "q5", "q6", "q7", "q8", "q9", "q10", "q11", "q12", "q13");
}
But I'm not sure if this way is correct.
@ragmani , I can't distinghish by the diff, what module has changed?
@ragmani , I can't distinghish by the diff, what module has changed?
I haven't changed any modules. I just added ;
at the end of each line.
@ragmani , I can't distinghish by the diff, what module has changed?
I haven't changed any modules. I just added
;
at the end of each line.
FYI, AFAIR, when trying to enable tizen build, the target without ';' is not added into build target.
FYI, AFAIR, when trying to enable tizen build, the target without ';' is not added into build target.
Sorry. I didn't understand what you said. What are "target" and "build target" you mentioned?
@jinevening You can find discussion about cmake version under https://github.com/Samsung/ONE/issues/9432#issuecomment-1184412692
#9916 is a way to fix the error commented at https://github.com/Samsung/ONE/issues/9432#issuecomment-1196362702