Linfeng Zheng
Linfeng Zheng
You haven't set the size of CooperativeGroup. The default value is 1. From your program, the producer_group has size of 32 ('if tidx < 32') while the consumer_group has size...
Thanks for pointing this out. Yes, `_nvvm_ops_gen.py` sometimes doesn't contain some ops we would like to use. Truly no {any, all} modes exposed in nvvm ir in current version. Before...
Hi @lucifer1004 , we found that the torch.enisum has precision issue for ada arch. If you use cpu tensors to call torch.enisum, or use fp32 datatype, the program could pass...