Gangsheng Wu issues

Results 7 issues of


                                            Gangsheng Wu

upgrade Checkpoint usage to fit ray 3.0

replace Checkpoint usage by using TorchCheckpoint/TensorflowCheckpoint

Could not running simple example on XPU

### Describe the issue Hi, I have following codes: ```python import dpctl import torch import intel_extension_for_pytorch xpu_num = len(dpctl.get_devices(backend="level_zero", device_type="gpu")) print(f"xpu_num = {xpu_num}") # device = torch.device("xpu:0") device = torch.device("cpu:0")...

XPU/GPU

UserExperience

ONEAPI_DEVICE_SELECTOR not working under sub-process

My environment: ``` bash export ONEAPI_DEVICE_SELECTOR="level_zero:gpu" ``` ``` bash $ sycl Warning: ONEAPI_DEVICE_SELECTOR environment variable is set to level_zero:gpu. To see the correct device id, please unset ONEAPI_DEVICE_SELECTOR. [ext_oneapi_level_zero:gpu:0] Intel(R)...

[Train] Add example of fine-tuning Llama-2 on Intel Gaudi

## Why are these changes needed? To leverage the potential of Intel Gaudi accelerator, we extend Ray Train's capabilities by adding support for Intel Gaudi (HPU) hardware. This PR include...

[Train] Add example of pre-training Llama model on Intel Gaudi

## Why are these changes needed? To leverage the potential of Intel Gaudi accelerator, we extend Ray Train's capabilities by adding support for Intel Gaudi (HPU) hardware. This PR include...

triage

train

Is there example of FP8 train LLM, pre-train or fine-tune

### Feature request I see the release version 1.12 has supported fp8, but I didn't see any example code for how to train LLM by using FP8. How can I...

fix OOM when inference with llama-3.1-70b

# What does this PR do? ## background when I running inference with command: ``` bash INPUT=32768 OUTPUT=32768 BATCH_SIZE=12 python gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py \ --model_name_or_path Meta-Llama-3.1-70B-Instruct/ \ --max_input_tokens...

review wip