Ma Mingfei comments

Results 93 comments of


                                            Ma Mingfei

[Roadmap] CPU Performance Optimization for PyG

Yes, the current `scatter_add` is kind of like `sorting` + `segment_add`, and both parts are properly paralleled. Because of the semantics limitation, we can not skip sorting since from PyTorch...

[Roadmap] CPU Performance Optimization for PyG

Update on `spmm` optimizations, PR submitted at https://github.com/pytorch/pytorch/pull/83727. Port `spmm` reduction from `torch-sparse` to `torch`, the current PR is only for demonstrating performance gains, API definition needs more amendment. Now...

[Roadmap] CPU Performance Optimization for PyG

#### Update on Training optimization Optimize `torch.gather` for the classic pyg use case (`index` tensor is broadcasted), this will be the backward for `scatter` in training, https://github.com/pytorch/pytorch/pull/87586 When the `index`...

[Roadmap] CPU Performance Optimization for PyG

Sort out the 2nd stage optimization a little bit: - [x] optimization of `sampled_addmm` on SparseCSR: https://github.com/pytorch/pytorch/pull/90978 - [x] enabling of `sampled_addm` on SparseCOO (canceled) - [ ] unify `ReduceTypes`:...

[RFC] torchvision performance optimization on CPU

> Hi, any update on this? [NicolasHug](https://github.com/NicolasHug) and [vfdev-5](https://github.com/vfdev-5) have done a lot of job in optimizing int8/uint8 image scaling/resize on torch.

add cl3d support for conv deconv and add arg is_channels_last

@CaoE https://github.com/pytorch/pytorch/pull/99539 this one will be landed first, rebase again when it is settled. and add more unit test cases for both inference and training.

Support code gen for non-cuda targets with gpt-fast

@mikekgfb cool ! We are just about to do something similar on the CPU device. We will add native support for int4 kernels on CPU: * [_convert_weight_to_int4pack_cuda](https://github.com/pytorch/pytorch/blob/fcf6a76108be8e7b6db528b631a7b8ebdc7470ac/aten/src/ATen/native/native_functions.yaml#L4059) * [_weight_int4pack_mm_cuda](https://github.com/pytorch/pytorch/blob/fcf6a76108be8e7b6db528b631a7b8ebdc7470ac/aten/src/ATen/native/native_functions.yaml#L4065) So...

Ma Mingfei

[Roadmap] CPU Performance Optimization for PyG

[Roadmap] CPU Performance Optimization for PyG

[Roadmap] CPU Performance Optimization for PyG

[Roadmap] CPU Performance Optimization for PyG

[RFC] torchvision performance optimization on CPU

add cl3d support for conv deconv and add arg is_channels_last

Support code gen for non-cuda targets with gpt-fast

Support code gen for non-cuda targets with gpt-fast

Development Roadmap (2024 Q3)

Add Intel Advanced Matrix Extensions (AMX) support to ggml