nntrainer icon indicating copy to clipboard operation
nntrainer copied to clipboard

[ Wait for #2500 ] [ BLAS ] Refactor blas/math related files into cpu backend considering arch-dep

Open skykongkong8 opened this issue 2 months ago • 9 comments

While during the process of implementing additional features in NEON, I found myself making unnecessary code blocks. This is a suggestion-draft of refactorization for current blas/math related files.

DONE

  • build on x86 & unittest (w/ & w/o fp16)
  • build on android & unittest
  • for all fallbacks

TODO

  • Temporarily, cpu_backend.h itself is done by now.

  • Following step should be syncing this work with TensorV2 : #2500

  • Such syncing work means that: (Code changes OUTSIDE of cpu_backend directory)

    • Remove CBLAS params in Tensor to get rid of unnecessary dependency
    • Use valid params for functions like : sgemv, sgemm (StorageOrder, data addr, ... )
    • This is already done for tensor.cpp in this PR, while intentionally eliminating ALL TensorV2 related files
    • Replace all #include <blas_interface.h> to #include <cpu_backend.h>
    • Replace all #include <blas_neon.h> to #include <neon_single.h>
  • And for final step, refactorize for proper function name (e.g. sgemm -> gemm, or hgemm, ... )

Final form of this PR would be like:

...
tensor
├─── cpu_backend
│   └─── cpu_backend.h (has all external functions from previous `blas_interface.h`)
│   └─── aarch64
│      └─── aarch64_compute_backend
│      └─── neon_half (For armv8.2+)
│      └─── neon_single
│           ...
│   └─── x86_64
│      └─── x86_64_compute_backend
│      └─── AVX2_implemented (vcvt, cblas)
│      └─── AVX2_not_yet_implemented (go to fallback_internal)
│           ...
│   └─── fallback
│      └─── fallback ( !x86_64 & !aarch64 )
│      └─── fallback_internal
...

and removing blas_interface.h

skykongkong8 avatar Apr 18 '24 07:04 skykongkong8