TensorComprehensions
TensorComprehensions copied to clipboard
Less strict TC + JIT requirements
Currently TC requires to compile a new fully-specialized version for each new tensor size.
#327 has some context about usage in the C2 case:
What if dimensions changed? You might need to store a cache of compiled kernels and compare dimensions. Or have a flag that tells TcOp that dimensions are fixed. Do you think we can do dimensions checks fast enough so we can actually do them all the time?
After #307 we could relax JIT requirements by:
- [ ] making Halide + LLVM can compile a check function for you that we can run always and drop uncheckedRun once and for all. But first let's actually benchmark it.
- [ ] implement an OptionsCacheKey -> TcExecutor store and reuse tuned options to compile and memoize new executors
- [ ] revive compilation caches
- [ ] implement the sparse regions + nearest neighbor search + JIT I had suggested
- [ ] as @abadams had suggested we can also tune for a size and pre-tune for a neighborhood by synthesizing parametric kernels.
Hmm, would it make sense, in the autotuner, to know that some options (tile/block sizes in particular) are produced as a fraction of the problem size? And then adapt them to the new problem size. I think we had an example where we mapped to 27 threads because the relevant size was 27, it would be sad to map size 28 to blocks of 27 threads and have twice as many warps as would be sufficient.
@ftynse yup, forgot that in my list :) I would put it in sparse region + NN search + JIT with adaptation. Unless you see a full independent task for it?