Ma Mingfei

Results 11 issues of Ma Mingfei

@jcjohnson hi, really nice benchmark! I am working on torch optimization for intel platforms, Xeon and Xeon Phi. Our optimized version is much faster than original torch cpu backend and...

Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #74023 * #70897 * #77060 Differential Revision: [D35782442](https://our.internmc.facebook.com/intern/diff/D35782442) Add channels last 3d support for both Conv3d and ConvTransposed3d

open source
cla signed
intel

Stack from [ghstack](https://github.com/ezyang/ghstack): * #74023 * __->__ #70897 * #77060 Differential Revision: [D33571076](https://our.internmc.facebook.com/intern/diff/D33571076) This patch is about enabling `channels last` support for the fallback ATen implementation for transposed convolution. So...

open source
cla signed
intel

Stack from [ghstack](https://github.com/ezyang/ghstack): * #74023 * #70897 * __->__ #77060 Differential Revision: [D36377117](https://our.internmc.facebook.com/intern/diff/D36377117) This patch is about enabling mkldnn on float32 and bfloat16 for transposed convolution on CPU path.

open source
cla signed
intel

### 🚀 The feature, motivation and pitch The goal of this roadmap is to optimize CPU performance for PyG (including `torch_scatter`, `torch_sparse`). For the first step, we will start with...

0 - Priority P0
benchmark
nn
loader
roadmap

## 🚀 The feature This RFC is targeting at improving performance of operators from torchvision on CPU. ## Motivation, pitch Generally performance improvements can be made in 3 ways: *...

This proposal is to add amx detection in cpuinfo, amx refers to `Intel® Advanced Matrix Extensions (Intel® AMX)`: https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/advanced-matrix-extensions/overview.html something like: ``` cpuinfo_has_x86_amxbf16() cpuinfo_has_x86_amxint8() ``` once this is settled, we...

enhancement

This PR improves intel server CPU performance with intel advanced matrix extension (AMX). AMX is a new built-in accelerator for gemm starting from 4th gen Xeon: https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/advanced-matrix-extensions/overview.html The basic idea...

performance
build
Review Complexity : Medium
ggml

- [x] I have read the [contributing guidelines](https://github.com/ggerganov/llama.cpp/blob/master/CONTRIBUTING.md) - Self-reported review complexity: - [ ] Low - [x] Medium - [ ] High replacement of https://github.com/ggerganov/llama.cpp/pull/7707 to trigger ggml-ci on...

build
ggml

Hi, I am an engineer from Intel and I work mostly on the performance optimization of PyTorch on intel Xeon CPUs (also I am the pytorch module maintainer for cpu...

enhancement