AutoKernel
AutoKernel copied to clipboard
支持多核吗?怎么支持
如题,如果要编译支持多核(多core)的算子,应该怎么做?
可以通过parallel()调度原语来调用: https://autokernel-docs-en.readthedocs.io/en/latest/tutorials/halide/halide_schedule.html#schedules-within-stages-domain-order
parallel (x,size) | Splits the dimension by size and parallelizes the outer dimension.
如果要控制具体的线程数,可以通后环境变量HL_NUM_THREADS:
HL_NUM_THREADS=... specifies the number of threads to create for the thread pool. When the async scheduling directive is used, more threads than this number may be required and thus allocated. A maximum of 256 threads is allowed. (By default, the number of cores on the host is used.)