Halide
Halide copied to clipboard
Need docs/faq for new Halide contributors
It's not clear to dev contributors how to e.g. debug the compiler vs debugging the runtime, what the overall compilation flow looks like, etc. We should have something like README_contributing.md
Tagging for visibility: @mcourteaux
Things I have struggled with or wish I knew earlier:
Architecture:
- [x] What the
.ll
files are, generated by clang. - [ ] What the
.ll
files are, sitting in the repo. - [x] Understanding the LLVM_Runtime_Linker.
- [ ] When using AOT generators, the runtime should be compiled once, instead of for every pipeline, which means you should use
no_runtime
all the time for serious projects. - [ ]
debug
feature flag can influence both the runtime, and of the pipeline, which is a struggle if you separated out the runtime compilation from the pipelines. - [ ] The
void *user_context
in the runtimes is implicit in the IR, and it inserted somewhere very late inCodeGen_Internal
.
Compiling:
- [ ] The
Makefile
is broken for installing autoschedulers, and you have to runmake autoschedulers
to trigger the final copy to thebin/
folder in order to be able to compile the apps (this should probably be an issue). - [ ] All the
apps/
use some default Makefile behavior where you have to setHL_TARGET
environment variable to something like:host-cuda
to experiment with CUDA.
Source code subtleties:
- [ ] That the global debug variable in Halide is
HL_DEBUG_CODEGEN
, which gave me the impression that there should be anotherHL_DEBUG
orHL_DEBUG_xxx
variable for other stuff, but that's not the case. - [ ] There is a
runtime/printer.h
which is also accessible throughdebug(...)
, but takes a context pointer instead of a log-level. This runtime printer is the one that gets enabled by thedebug
feature on a generator when enabled on the runtime (!) and not on the pipeline. I'd consider renaming this tordebug()
or something to make things a little clearer for newcomers. @abadams What about a rename?
Things I still don't know (also as a user of Halide, rather than a contributor):
- [ ] How to properly compile the latest version of Halide and/or run the generator, such that when I have an AOT pipeline statically linked in my program, that I can debug through it and has decent debug symbols for inspecting the callstack and stepping through some code.
- [ ] What the
registration
andfeaturization
generator options are.registration
is even a by-default enabled option, but I don't think I ever use it: I just generate header and static library to link with. Additionally, I don't know what you'd usecpp_stub
for. - [ ] If there is a faster way to iterate on Halide development than to
make -j20
in the Halide repository and wait until the halide.a and halide.so files are ready to recompile my program that uses Halide.
Things I wish I knew earlier as a user of Halide:
- [ ] stmt files are useful (and since recently the conceptual_stmt file is even more useful in certain scenarios). The HTML version is really useful, right now, IMO.
- [ ] The
gpu_thread
loops (andgpu_block
loops for all practical purposes) are not loops in practice. - [ ] Device-dirty and host-dirty names are backwards to intuition. My initial way of thinking about this was: If the device is writing to a buffer, then the device has the most recent version, and the host version is dirty. I.e.: backwards.
- [ ] Your code can get a lot cleaner if you specify bounds AND strides of buffers in your schedule (especially setting min=0, and stride=extend*(whateverisneeded), for example).
- [ ] Names of variables (in Stmt) are replicated in the
parallel
functions: so on the call-site of ahalide_launch_parallel_for()
the variables that are captured share the same name in the function-body. - [ ] The trick to take the zero-init of a Func with a reduction to the actual:
f.compute_at(f.in(), innermost_var_of_f); f.in() // actual schedule you had in mind for f ;
In light of addressing these and others in the Wiki, what's up with the GSoC pages? Can they go, or be moved to a subdirectory in the wiki specifically for GSoC? I couldn't find any up-to-date info on the official GSoC website. @steven-johnson @vksnk
Note ... I've started adding some general discussion on the Halide system architecture here: https://github.com/halide/Halide/wiki/System-Architecture
I'll try and add explanations for all the specific topics you've listed as I go.
@derek-gerstmann Nice! For keeping track of things a little bit better, I turned my list into a checklist.
Excuse me, is this the right place to add to the tutorial wishlist? I have a few vendor-specific tutorials I wish to be included in the documentation:
-
[ ] #7148 To copy-and-paste auto-generator output
my_pipeline.schedule.h
back into theGenerator::schedule()
as a manual schedule. Use case: version control of the Halide schedule; schedule fine-tuning by domain experts. -
[x] Syntax to set auto-scheduler parameters in the generator command-line interface: https://github.com/halide/Halide/blob/083927020a594a66f6e45ccd407ad922d867906b/src/autoschedulers/anderson2021/CostModel.h#L18
-
[ ] (Nvidia GPU target only): The ability to retrieve the
cuContext
as well as thecudaStream_t
from the Halide runtime library. Use case: calling Nvidia runtime algorithms (cuFFT,opencv::gpu::path_NLM
) asychronously in the middle of the halide pipeline, likely viaextern_define
. https://github.com/halide/Halide/blob/97573c6e6f803a234be18e37648c3399537808fd/src/runtime/cuda.cpp#L230 -
[ ] Similarly, the ability to acquire cuContext for multi-threaded application. Use case: Multi-threaded data pipeline involving data dependency
read from filesystem -> halide_pipeline -> write to filesystem
having pipeline parallelism of 3 concurrent threads. The threehalide_pipeline()
calls sharing the samecuContext
. https://github.com/halide/Halide/blob/97573c6e6f803a234be18e37648c3399537808fd/test/generator/acquire_release_aottest.cpp#L66-L74