Gangsheng Wu
Gangsheng Wu
replace Checkpoint usage by using TorchCheckpoint/TensorflowCheckpoint
### Describe the issue Hi, I have following codes: ```python import dpctl import torch import intel_extension_for_pytorch xpu_num = len(dpctl.get_devices(backend="level_zero", device_type="gpu")) print(f"xpu_num = {xpu_num}") # device = torch.device("xpu:0") device = torch.device("cpu:0")...
My environment: ``` bash export ONEAPI_DEVICE_SELECTOR="level_zero:gpu" ``` ``` bash $ sycl Warning: ONEAPI_DEVICE_SELECTOR environment variable is set to level_zero:gpu. To see the correct device id, please unset ONEAPI_DEVICE_SELECTOR. [ext_oneapi_level_zero:gpu:0] Intel(R)...
## Why are these changes needed? To leverage the potential of Intel Gaudi accelerator, we extend Ray Train's capabilities by adding support for Intel Gaudi (HPU) hardware. This PR include...
## Why are these changes needed? To leverage the potential of Intel Gaudi accelerator, we extend Ray Train's capabilities by adding support for Intel Gaudi (HPU) hardware. This PR include...
### Feature request I see the release version 1.12 has supported fp8, but I didn't see any example code for how to train LLM by using FP8. How can I...
# What does this PR do? ## background when I running inference with command: ``` bash INPUT=32768 OUTPUT=32768 BATCH_SIZE=12 python gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py \ --model_name_or_path Meta-Llama-3.1-70B-Instruct/ \ --max_input_tokens...