Frédéric Bastien issues

Results 28 issues of


Frédéric Bastien

[XLA:GPU] HorizontalLoopFusion now generates a cleaner graph

This ask less works by other optimization pass. This is a safe subset of another PR. In case of revert, it will revert less changes. The main changes are: -...

comp:xla

size:M

Common CUDA allocator?

I’m making this issue here in DLPack, as I do not think of a better place for this. This issue is between many software and as this is the goal...

[requires.io] dependency update on master branch

secure_dump should predump in the same folder as the destination

https://github.com/mila-udem/blocks/blob/master/blocks/serialization.py#L226 This do a dump to a temp file and then move it to the destination. This is safer in case of a crash during the dump. But it cause...

CCW

[requires.io] dependency update on master branch

[XLA] When dumping the html files, also dump a version without the fusion details.

It allows to not dump the fusion internal. This WAR the issue that "too big" HTML files aren't rendered by the browser. We can manually look at the txt hlo...

awaiting review

comp:xla

size:S

JIT+Shardmap+donate_argnums doesn't work, while pmap+donate_argnums works.

If I use: ``` jax.pmap(f_pmap, in_axes=0, out_axes=0, axis_name='x', donate_argnums=0) ``` It works. But when this code is ported to jax.array + shardmap to support multi-node, this run, but warn that...

enhancement

NVIDIA GPU

GPU

[DRAFT] Custom gpu ops custom partitioning

This in progress PR modify the docs/Custom_Operation_for_GPUs.py tutorial to use custom_partitioning instead of xmap. Don't review now, there is still much work done. - [x] finish the forward code. -...

For elemwise, remove duplication indexing when some inputs have the same strides patterns

If we add 3 tensor and 2 of them are c contiguous but not the last, it seam wasteful to compute the index 2 times for each c contiguous array.

Optimization

Complex in Gpundarray on opencl

We could use PyOpenCL w http://documen.tician.de/pyopencl/array.html#complex-numbers In addition, I've added a rudimentary facility for translating Fortran kernels to OpenCL, see here: https://github.com/inducer/pyopencl/tree/master/contrib/fortran-to-opencl