Zhang, Liangang issues

Results 10 issues of


                                            Zhang, Liangang

DLRM training for MLPerf v1.0 submission. not merged.

1)Vertical split embedding to scale-out to much more ranks. 2)LAMB to enable large batch size.

lamb fused just for mlperf, not merged

Paged Attention Based the Latest Cache Design

# What does this PR do? Based on the latest cache design on [#PR26681](https://github.com/huggingface/transformers/pull/26681), This PR implements the Paged Attention KV cache which is proposed by this [paper](https://arxiv.org/pdf/2309.06180.pdf). Fixes #...

WIP

KV cache optimization with paged attention

### Feature request Paged attention has been enabled by a lot of server engine, e.g., [vllm](https://github.com/vllm-project/vllm), [tensorrt-llm](https://github.com/NVIDIA/TensorRT-LLM/blob/release/0.5.0/tensorrt_llm/runtime/kv_cache_manager.py) ### Motivation KV cache is used to reduce computation for Decoder layer but...

WIP

Paged attention

Related [RFC](https://github.com/pytorch/pytorch/issues/121465)

CLA Signed

Add device support

Split PR #1480 to several smaller ones. this PR is to enable different device in the runtime.

[Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch

Pytorch already support XPU device since 2.4 release and _xpu_ is also supported in OpenAI Trition. So, it should works with the Trition attention backend in SGLang. In this PR,...

[WIP][FlexAttention] Enable tensor descriptor for FlexAttention backward

Fixes #163543 cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

open source

topic: not user facing

module: inductor

[xpu][test][FlexAttention]Enable the test_GQA on Intel XPU

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @chenyang78

triaged

open source

ciflow/trunk

topic: not user facing

module: inductor

merging

ciflow/xpu

release notes: xpu

Intel XPU CI Enabling

# Motivation To improve quality on Intel XPU device, we plan to enable the CI/CD process on Intel XPUs. The CI is based on the docker env and we will...