nntrainer
nntrainer copied to clipboard
[ Wait for #2500 ] [ BLAS ] Refactor blas/math related files into cpu backend considering arch-dep
While during the process of implementing additional features in NEON, I found myself making unnecessary code blocks. This is a suggestion-draft of refactorization for current blas/math related files.
DONE
- build on x86 & unittest (w/ & w/o fp16)
- build on android & unittest
- for all fallbacks
TODO
-
Temporarily,
cpu_backend.h
itself is done by now. -
Following step should be syncing this work with
TensorV2
: #2500 -
Such syncing work means that: (Code changes OUTSIDE of
cpu_backend
directory)- Remove CBLAS params in Tensor to get rid of unnecessary dependency
- Use valid params for functions like :
sgemv
,sgemm
(StorageOrder, data addr, ... ) - This is already done for
tensor.cpp
in this PR, while intentionally eliminating ALLTensorV2
related files - Replace all
#include <blas_interface.h>
to#include <cpu_backend.h>
- Replace all
#include <blas_neon.h>
to#include <neon_single.h>
-
And for final step, refactorize for proper function name (e.g.
sgemm
->gemm
, orhgemm
, ... )
Final form of this PR would be like:
...
tensor
├─── cpu_backend
│ └─── cpu_backend.h (has all external functions from previous `blas_interface.h`)
│ └─── aarch64
│ └─── aarch64_compute_backend
│ └─── neon_half (For armv8.2+)
│ └─── neon_single
│ ...
│ └─── x86_64
│ └─── x86_64_compute_backend
│ └─── AVX2_implemented (vcvt, cblas)
│ └─── AVX2_not_yet_implemented (go to fallback_internal)
│ ...
│ └─── fallback
│ └─── fallback ( !x86_64 & !aarch64 )
│ └─── fallback_internal
...
and removing blas_interface.h