gpt-fast icon indicating copy to clipboard operation
gpt-fast copied to clipboard

intel gpu : enable intel gpu

Open xiaowangintel opened this issue 1 year ago • 2 comments

This PR adds the initial Intel GPU support in GPT-fast with the device option "xpu" (i.e., --device "xpu"). Both single device and multi-device via tensor parallel are supported functionally while performance is still being improved. Refer to the following steps to run the generation on Intel GPU. We will update the tutorial later with improved performance.

Installation

  1. Install pytorch and Intel® Extension for PyTorch: https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/introduction.html#
  2. install oneCCL for distributed: https://github.com/oneapi-src/oneCCL
  3. install Intel® Extension for Triton (needed by torch.compile): https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/features/torch_compile_gpu.html

How to run gpt-fast code on intel GPUs?

  1. command for single device: python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --speculate_k 5 --prompt "Hi my name is" --device xpu
  2. command for multiple devices via Tensor Parallelism: ENABLE_INTRA_NODE_COMM=1 torchrun --standalone --nproc_per_node=2 generate.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --device xpu

Note:

  1. Please export UR_L0_IN_ORDER_BARRIER_BY_SIGNAL=0, a temporary configuration, to avoid unnecessary errors, when runs gpt-fast code with torch.compile.
  2. Please export IPEX_ZE_TRACING=1, a temporary configuration, to get event, when runs gpt-fast code with profile.
  3. Currently, only bf16 is supported, and int4/int8 will be supported later via IPEX without requiring code changes in gpt-fast.

xiaowangintel avatar Jan 10 '24 03:01 xiaowangintel

Please add to the PR description 1) how to build/install the pre-requisite software components; 2) how to run inference with and without tensor parallel.

jgong5 avatar Jan 10 '24 03:01 jgong5

@Chillee This is the initial PR to support Intel GPU. Most needed code changes should be there. Further performance optimizations will be applied inside IPEX. May I ask your review? Thanks!

jgong5 avatar Jan 12 '24 12:01 jgong5