stencilflow
stencilflow copied to clipboard
Introduction
This repository implements an end-to-end stack that compiles a high-level description of a stencil program to hardware. Dependencies between stencil operators are resolved by streaming fine-grained results directly between processing elements on the chip.
Prerequisites
To run the code, the following software must be available:
- Python 3.6.x or newer.
- The
virtualenvmodule (installed withpip install virtualenv). - A C++17-capable compiler (e.g., GCC 7.x or Clang 6.x).
- One or both FPGA compilers:
- Intel FPGA OpenCL SDK (tested with 18.1.1 and 19.1)
- Xilinx Vitis (tested with 2020.2)
Setup
Sourcing the script setup_virtualenv.sh will setup a virtualenv with all the
modules required to run StencilFlow, including the relevant version of DaCe:
source setup_virtualenv.sh
Running
To run the end-to-end flow on an input JSON file, the executable
bin/run_program.py can be used. Example usage:
bin/run_program.py test/stencils/jacobi3d_32x32x32_8itr_8vec.json emulation -compare-to-reference
This will compile the FPGA kernel for Intel's emulation flow, execute it, build a reference CPU program, run both, and verify that the results match.
The generated program will be located in .dacecache/<kernel name>, with the
kernel source files themselves in:
.dacecache/<kernel name>/src/intel_fpga/device
Verification
For programs using the "shrink" boundary conditions, the borders will
intentionally have invalid results in them. To do validation in this scenario,
use the -halo=3 flag to specify how large of a halo should be ignored in
validation in each dimension.
Program description
Examples of program descriptions are located in test/stencils, including for
2D and 3D stencils, vectorization, and lower dimensional inputs.
Executables
All executables are included in the bin subfolder, and have documented command
line interfaces.
Tests
The repository ships with a number of tests that verify various aspects of functionality. These can be run with:
test/test_stencil.py
It is a known issue that launching multiple Intel FPGA kernels in quick succession (such as is done in the tests) can sometimes fail sporadically, seemingly due to file I/O issues. Running individual programs should never fail.