TensorComprehensions
TensorComprehensions copied to clipboard
Simple Example Walkthrough
Existing tests run certain scripts to test certain aspects of the TC. However, there is no simple example that starts with very simple kernels like GEMM or convolution and takes it forward through the tool chain.
The example folder also runs a shell script that performs a genetic algorithm search: 1- it is too complicated 2-it fails to build and compile without CUDNN
What is missing is a simple example that is independent of CUDA and allows the user to observe the evolution of code through the toolset with exposure of IRs.
Hi @perdavan thanks a lot for creating the issue. To be clear, you would like to see some examples or walkthrough on the C++ usage or the python usage of TC?
cc @abadams / @ftynse for the halide / polyhedral IR
@perdavan for CPU-only tests there is test_mapper.cc
, test_mapper_llvm.cc
and test_mapper_memory_promotion.cc
which are unit tests that allow you to see what happens in the context of polyhedral mapping.
test_core.cc
shows what happens in the context of translation from TC to Halide to ISL including inference.
test_tc_mapper.cc
and test_tc_mapper_bugs.cc
require both the CUDA SDK and a physical GPU to run.
Alternatively, #225 adds a minimal C++ example for tuning that can be copy-pasted and modified to experiment with various other TCs. It does however require CUDA + GPUs as this is currently still the only fully functional backend. It does not require CUDNN or CUBLAS :) Hope this helps :D
However, from parsing your request it seems you would like a tutorial on the C++ IR + API. This is more a core developer thing at the moment and while you're most welcome to contribute, it will require following the mapper flow from the high-level compile call and digging aroudn to see what happens. The best way to get started with that is to call compile with some option (see the example in #225), set a breakpoint in gdb and poke around. As more people express interest we can certainly start exposing the IR C++ API more but it is still not going to be an easy ramp up.
Please let me know if #225 starts to address your request, as we make a CPU mapper available it will become easier to use without CUDA.