Ma Mingfei issues

Results 11 issues of


                                            Ma Mingfei

CPU performance update

@jcjohnson hi, really nice benchmark! I am working on torch optimization for intel platforms, Xeon and Xeon Phi. Our optimized version is much faster than original torch cpu backend and...

enable channels last 3d for Conv3d and ConvTranspose3d on mkldnn path

Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #74023 * #70897 * #77060 Differential Revision: [D35782442](https://our.internmc.facebook.com/intern/diff/D35782442) Add channels last 3d support for both Conv3d and ConvTransposed3d

open source

cla signed

intel

add channels last support for slow_conv_transpose2d

Stack from [ghstack](https://github.com/ezyang/ghstack): * #74023 * __->__ #70897 * #77060 Differential Revision: [D33571076](https://our.internmc.facebook.com/intern/diff/D33571076) This patch is about enabling `channels last` support for the fallback ATen implementation for transposed convolution. So...

open source

cla signed

intel

opitimze ConvTransposedND with mkldnn float32 and bfloat16 on CPU

Stack from [ghstack](https://github.com/ezyang/ghstack): * #74023 * #70897 * __->__ #77060 Differential Revision: [D36377117](https://our.internmc.facebook.com/intern/diff/D36377117) This patch is about enabling mkldnn on float32 and bfloat16 for transposed convolution on CPU path.

open source

cla signed

intel

[Roadmap] CPU Performance Optimization for PyG

### 🚀 The feature, motivation and pitch The goal of this roadmap is to optimize CPU performance for PyG (including `torch_scatter`, `torch_sparse`). For the first step, we will start with...

0 - Priority P0

benchmark

loader

roadmap

[RFC] torchvision performance optimization on CPU

## 🚀 The feature This RFC is targeting at improving performance of operators from torchvision on CPU. ## Motivation, pitch Generally performance improvements can be made in 3 ways: *...

[Feature Request] Add amx detection in cpuinfo

This proposal is to add amx detection in cpuinfo, amx refers to `Intel® Advanced Matrix Extensions (Intel® AMX)`: https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/advanced-matrix-extensions/overview.html something like: ``` cpuinfo_has_x86_amxbf16() cpuinfo_has_x86_amxint8() ``` once this is settled, we...

enhancement

Add Intel Advanced Matrix Extensions (AMX) support to ggml

This PR improves intel server CPU performance with intel advanced matrix extension (AMX). AMX is a new built-in accelerator for gemm starting from 4th gen Xeon: https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/advanced-matrix-extensions/overview.html The basic idea...

performance

build

Review Complexity : Medium

ggml

Add Intel Advanced Matrix Extensions (AMX) support to ggml

- [x] I have read the [contributing guidelines](https://github.com/ggerganov/llama.cpp/blob/master/CONTRIBUTING.md) - Self-reported review complexity: - [ ] Low - [x] Medium - [ ] High replacement of https://github.com/ggerganov/llama.cpp/pull/7707 to trigger ggml-ci on...

build

ggml

[Question] About CPU performance

Hi, I am an engineer from Intel and I work mostly on the performance optimization of PyTorch on intel Xeon CPUs (also I am the pytorch module maintainer for cpu...

enhancement