Guanhua Wang
Guanhua Wang
This PR adds unit test for `ds_quantize` kernel, In this example, we test over `torch.float16` data format. We tested in following two cases: 1. INT8 quant/dequant over tensor in 1...
**Describe the Bug** Try install on HGX-H100 nodes, pip install cannot enable build on cuda extensions like amp_C, etc. **Minimal Steps/Code to Reproduce the Bug** `pip install -v --disable-pip-version-check --no-cache-dir...
Hi I am running the demo-imdb-vectors.sh demo file on my MAC. It poses the FANN Error 1: Unable to open configuration file "imdbtrain.data" for reading. ./demo-imdb-vectors.sh: line 23: 644 Segmentation...
**Before**: Overflow check is scattered and duplicated in all places. **This PR:** - Single interface as CheckOverflow class, which abstract and uniform overflow check among ZeRO, ZeRO-Offload, Pipeline Parallelism, BF16_optimizer....
Hi I just took a quick look of fake tensor/module APIs. The defer initialization feature looks really cool to me. I am wondering, is there a way to de-materialize the...
I am using ds 0.15.1 on two A6000 GPUs, following the [huggingface Non-Trainer DeepSpeed integration](https://huggingface.co/docs/transformers/main/en/deepspeed?models=pretrained+model#non-trainer-deepspeed-integration), got assertion error: ``` guanhua@guanhua-Lambda:~/DiscQuant$ deepspeed test_hf_ds.py [2024-09-06 15:53:29,210] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda...