gpt-fast intel gpu : enable intel gpu

This PR adds the initial Intel GPU support in GPT-fast with the device option "xpu" (i.e., --device "xpu"). Both single device and multi-device via tensor parallel are supported functionally while performance is still being improved. Refer to the following steps to run the generation on Intel GPU. We will update the tutorial later with improved performance.

Installation

Install pytorch and Intel® Extension for PyTorch: https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/introduction.html#
install oneCCL for distributed: https://github.com/oneapi-src/oneCCL
install Intel® Extension for Triton (needed by torch.compile): https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/features/torch_compile_gpu.html

How to run gpt-fast code on intel GPUs?

command for single device: python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --speculate_k 5 --prompt "Hi my name is" --device xpu
command for multiple devices via Tensor Parallelism: ENABLE_INTRA_NODE_COMM=1 torchrun --standalone --nproc_per_node=2 generate.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --device xpu

Note:

Please export UR_L0_IN_ORDER_BARRIER_BY_SIGNAL=0, a temporary configuration, to avoid unnecessary errors, when runs gpt-fast code with torch.compile.
Please export IPEX_ZE_TRACING=1, a temporary configuration, to get event, when runs gpt-fast code with profile.
Currently, only bf16 is supported, and int4/int8 will be supported later via IPEX without requiring code changes in gpt-fast.

Jan 10 '24 03:01 xiaowangintel

Please add to the PR description 1) how to build/install the pre-requisite software components; 2) how to run inference with and without tensor parallel.

Jan 10 '24 03:01 jgong5

@Chillee This is the initial PR to support Intel GPU. Most needed code changes should be there. Further performance optimizations will be applied inside IPEX. May I ask your review? Thanks!

Jan 12 '24 12:01 jgong5

gpt-fast gpt-fast copied to clipboard

intel gpu : enable intel gpu

gpt-fast
gpt-fast copied to clipboard