nimlgen issues

Repositories
Issues
Comments

Results 31 issues of


                                            nimlgen

[WIP] fuse several reduces into one kernel

The following code `_ = ((x@y)+(a@b)).numpy()` can benefit from fusing it into one kernel. It was measured to get about 10gflops (~195 -> ~205). There are several cases where reduceops...

Init cudagraph

This is one more implementation of cuda graphs baed on 2 previous MR. This is a first MR, so has some basic functionality to create CUDAgraphs + setting some dynamic...