Halide icon indicating copy to clipboard operation
Halide copied to clipboard

Need docs/faq for new Halide contributors

Open abadams opened this issue 1 year ago • 6 comments

It's not clear to dev contributors how to e.g. debug the compiler vs debugging the runtime, what the overall compilation flow looks like, etc. We should have something like README_contributing.md

abadams avatar Oct 13 '23 20:10 abadams

Tagging for visibility: @mcourteaux

abadams avatar Oct 13 '23 20:10 abadams

Things I have struggled with or wish I knew earlier:

Architecture:

  • [x] What the .ll files are, generated by clang.
  • [ ] What the .ll files are, sitting in the repo.
  • [x] Understanding the LLVM_Runtime_Linker.
  • [ ] When using AOT generators, the runtime should be compiled once, instead of for every pipeline, which means you should use no_runtime all the time for serious projects.
  • [ ] debug feature flag can influence both the runtime, and of the pipeline, which is a struggle if you separated out the runtime compilation from the pipelines.
  • [ ] The void *user_context in the runtimes is implicit in the IR, and it inserted somewhere very late in CodeGen_Internal.

Compiling:

  • [ ] The Makefile is broken for installing autoschedulers, and you have to run make autoschedulers to trigger the final copy to the bin/ folder in order to be able to compile the apps (this should probably be an issue).
  • [ ] All the apps/ use some default Makefile behavior where you have to set HL_TARGET environment variable to something like: host-cuda to experiment with CUDA.

Source code subtleties:

  • [ ] That the global debug variable in Halide is HL_DEBUG_CODEGEN, which gave me the impression that there should be another HL_DEBUG or HL_DEBUG_xxx variable for other stuff, but that's not the case.
  • [ ] There is a runtime/printer.h which is also accessible through debug(...), but takes a context pointer instead of a log-level. This runtime printer is the one that gets enabled by the debug feature on a generator when enabled on the runtime (!) and not on the pipeline. I'd consider renaming this to rdebug() or something to make things a little clearer for newcomers. @abadams What about a rename?

Things I still don't know (also as a user of Halide, rather than a contributor):

  • [ ] How to properly compile the latest version of Halide and/or run the generator, such that when I have an AOT pipeline statically linked in my program, that I can debug through it and has decent debug symbols for inspecting the callstack and stepping through some code.
  • [ ] What the registration and featurization generator options are. registration is even a by-default enabled option, but I don't think I ever use it: I just generate header and static library to link with. Additionally, I don't know what you'd use cpp_stub for.
  • [ ] If there is a faster way to iterate on Halide development than to make -j20 in the Halide repository and wait until the halide.a and halide.so files are ready to recompile my program that uses Halide.

Things I wish I knew earlier as a user of Halide:

  • [ ] stmt files are useful (and since recently the conceptual_stmt file is even more useful in certain scenarios). The HTML version is really useful, right now, IMO.
  • [ ] The gpu_thread loops (and gpu_block loops for all practical purposes) are not loops in practice.
  • [ ] Device-dirty and host-dirty names are backwards to intuition. My initial way of thinking about this was: If the device is writing to a buffer, then the device has the most recent version, and the host version is dirty. I.e.: backwards.
  • [ ] Your code can get a lot cleaner if you specify bounds AND strides of buffers in your schedule (especially setting min=0, and stride=extend*(whateverisneeded), for example).
  • [ ] Names of variables (in Stmt) are replicated in the parallel functions: so on the call-site of a halide_launch_parallel_for() the variables that are captured share the same name in the function-body.
  • [ ] The trick to take the zero-init of a Func with a reduction to the actual:
    f.compute_at(f.in(), innermost_var_of_f);
    f.in()
     // actual schedule you had in mind for f
     ;
    

mcourteaux avatar Oct 16 '23 09:10 mcourteaux

In light of addressing these and others in the Wiki, what's up with the GSoC pages? Can they go, or be moved to a subdirectory in the wiki specifically for GSoC? I couldn't find any up-to-date info on the official GSoC website. @steven-johnson @vksnk

mcourteaux avatar Oct 27 '23 11:10 mcourteaux

Note ... I've started adding some general discussion on the Halide system architecture here: https://github.com/halide/Halide/wiki/System-Architecture

I'll try and add explanations for all the specific topics you've listed as I go.

derek-gerstmann avatar Oct 27 '23 19:10 derek-gerstmann

@derek-gerstmann Nice! For keeping track of things a little bit better, I turned my list into a checklist.

mcourteaux avatar Oct 28 '23 09:10 mcourteaux

Excuse me, is this the right place to add to the tutorial wishlist? I have a few vendor-specific tutorials I wish to be included in the documentation:

  • [ ] #7148 To copy-and-paste auto-generator output my_pipeline.schedule.h back into the Generator::schedule() as a manual schedule. Use case: version control of the Halide schedule; schedule fine-tuning by domain experts.

  • [x] Syntax to set auto-scheduler parameters in the generator command-line interface: https://github.com/halide/Halide/blob/083927020a594a66f6e45ccd407ad922d867906b/src/autoschedulers/anderson2021/CostModel.h#L18

  • [ ] (Nvidia GPU target only): The ability to retrieve the cuContext as well as the cudaStream_t from the Halide runtime library. Use case: calling Nvidia runtime algorithms (cuFFT, opencv::gpu::path_NLM) asychronously in the middle of the halide pipeline, likely via extern_define . https://github.com/halide/Halide/blob/97573c6e6f803a234be18e37648c3399537808fd/src/runtime/cuda.cpp#L230

  • [ ] Similarly, the ability to acquire cuContext for multi-threaded application. Use case: Multi-threaded data pipeline involving data dependency read from filesystem -> halide_pipeline -> write to filesystem having pipeline parallelism of 3 concurrent threads. The three halide_pipeline() calls sharing the same cuContext. https://github.com/halide/Halide/blob/97573c6e6f803a234be18e37648c3399537808fd/test/generator/acquire_release_aottest.cpp#L66-L74

antonysigma avatar Oct 28 '23 20:10 antonysigma