CNN-on-flash
CNN-on-flash copied to clipboard
CNN functions for dense matrices resident in flash storage
trafficstars
CNN-on-flash
The goal of this project is to run Convolutional Neural Network layers for flash-resident-matrices. Now gemm using NEON for ARM CPU is implemented.
References
This project is implemented based on BLAS-on-flash and run using Arm Compute Library.
- BLAS-on-flash https://github.com/microsoft/BLAS-on-flash
- Arm Compute Library https://github.com/ARM-software/ComputeLibrary
Requirements
- Ubuntu 16.04
- Arm Compute Library 19.02
- built with neon option turned on
Setting options
Set CMakeFiles options as you want.
vim CMakeFiles
- PROGRAM_BUDGET Memory budget of the gemm with byte size
- GEMM_BLK_SIZE The number of rows and cols of submatrices
- N_IO_THR The number of IO threads
- N_COMPUTE_THR The number of compute threads
Build instructions
git clonevim CMakeLists.txt- modify
set (ACL_ROOT [arm_compute_library_path])
- modify
mkdir bin && cd bincmake ..makecd ..
Execution
gemm execution
cd miscchmod +x gemm.sh./exec.sh [A_row] [B_row] [B_col]
Example experiment result
Example case with
- size of inputs and output matrices = 4096x4096
- GEMM_BLK_SIZE = 512
- and various memory budget
- run on Odroid-XU4 having Exynos5422 Inference time and maximum memory usage is shown on following graph.

More detailed explanation for method and results can be found in BLAS-on-flash paper and this paper.
License
CNN-on-flash is open-sourced software licensed under the MIT license.