Frédéric Bastien
Frédéric Bastien
This ask less works by other optimization pass. This is a safe subset of another PR. In case of revert, it will revert less changes. The main changes are: -...
I’m making this issue here in DLPack, as I do not think of a better place for this. This issue is between many software and as this is the goal...
https://github.com/mila-udem/blocks/blob/master/blocks/serialization.py#L226 This do a dump to a temp file and then move it to the destination. This is safer in case of a crash during the dump. But it cause...
It allows to not dump the fusion internal. This WAR the issue that "too big" HTML files aren't rendered by the browser. We can manually look at the txt hlo...
If I use: ``` jax.pmap(f_pmap, in_axes=0, out_axes=0, axis_name='x', donate_argnums=0) ``` It works. But when this code is ported to jax.array + shardmap to support multi-node, this run, but warn that...
This in progress PR modify the docs/Custom_Operation_for_GPUs.py tutorial to use custom_partitioning instead of xmap. Don't review now, there is still much work done. - [x] finish the forward code. -...
If we add 3 tensor and 2 of them are c contiguous but not the last, it seam wasteful to compute the index 2 times for each c contiguous array.
We could use PyOpenCL w http://documen.tician.de/pyopencl/array.html#complex-numbers In addition, I've added a rudimentary facility for translating Fortran kernels to OpenCL, see here: https://github.com/inducer/pyopencl/tree/master/contrib/fortran-to-opencl