Results 50 comments of Rohan Yadav

cc @RawnH i pushed some more code

I don't think that the Python API is currently able to generate CUDA kernels. You will have to use the C++ API or the web tool to do so.

@fredrikbk do we have any plans for a new distribution of PyTaco? I don't think we have anyone working on this currently.

I'm not sure anyone on the development team right now has an M1 mac to reproduce this issue. However, it looks like a small precision error that seems ignorable?

Can you share the link of the web interface that led to the error (it includes the schedules and formats). Trying it myself, it looks like this particular case works.

I don't think this code is quite right yet (still working on it), but the idea seems fine -- we want to buffer changes to the same output location `i`...

I don't see an easy way to do this when trying to use multiple threads. It looks like you need some way of having a check after each parallel block...

I'm not sure what's the best way to test this since I need to compile with `simplify = false` for the invalid output to show up. Figuring out how to...

Yes, I believe fusing without iterating over the position space is supported for only dense dimensions currently. I'm not sure what the best way of fusing co-iteration loops looks like.