Axel Feldmann
Axel Feldmann
Works great, thanks so much for the quick fix! One more question: I see that this new function supports both vertex and edge weights, but in my testing, it seems...
Thanks! I was mostly hoping for vertex weights :) Specifically, I'm hoping that for clustering an SpMV graph (rows and cols as vertices, nonzeros as edges) I can increase the...
Thanks for the reply! I appreciate it :) Follow up questions: 1. What is the idea of composed layouts if you can't actually use them to initialize smem tensors? Where...
I see, thanks. Is Hopper generally unsupported for the CuTe DSL?
Thanks @thakkarV Is there anywhere I can read more about this distinction and how this all works?
Thanks! I think there may be a similar issue with `cute.nvgpu.cpasync.tma_partition`? If I try to use the composed layout, then the gemm works fine but the `tma_partition` breaks. Maybe I'm...
This makes sense. I tried fixing it by doing the following: ``` producer_group = utils.CooperativeGroup(utils.Agent.Thread, size=32, alignment=32) consumer_group = utils.CooperativeGroup(utils.Agent.Thread, size=128, alignment=128) ``` (with no other changes). Unfortunately, this deadlocks...
> the producer arrive cnt should be 1 since only one thread will do the tma copy. The producer_commit of TmaAsyncPipeline does nothing. And the arrive behavior is done in...
Thanks so much! This fixes the problem. Two final questions: 1. Is this considered a bug or an intended behavior? For example, with the public version, is it possible to...
Follow up question: suppose that we want to have 2 warpgroups in the consumer group. Is it possible to specify that with a single pipeline object now? More precisely, is...