Yinuo Liu
Yinuo Liu
Given the current design, compare to directly using the co-iteration, I think we can firstly replace the binary search module with traverse to generate mid-buffer.In this way, the complexity of...
Yes, you are right. If you want to achieve maximum parallelism, using binary search is reasonable. I also read your related paper,it seems the backend you considered is GPU, which...
Thank you for your reply, as you said CPU may not has so manys cores as cuda cores in GPU. In this situation the benefit from parallelism in `k` axis...
OK I got it,it would be a nice choice.We can discuss this in detail later.
@yzh119 Do you mean simply bypass the var checking in buffer shape and stride field in VarUseDefAnalysis pass? If so, I am wondering will it cause an assertion error when...