Guanhua Wang

Results 5 issues of Guanhua Wang

This PR adds unit test for `ds_quantize` kernel, In this example, we test over `torch.float16` data format. We tested in following two cases: 1. INT8 quant/dequant over tensor in 1...

**Describe the Bug** Try install on HGX-H100 nodes, pip install cannot enable build on cuda extensions like amp_C, etc. **Minimal Steps/Code to Reproduce the Bug** `pip install -v --disable-pip-version-check --no-cache-dir...

bug

Hi I am running the demo-imdb-vectors.sh demo file on my MAC. It poses the FANN Error 1: Unable to open configuration file "imdbtrain.data" for reading. ./demo-imdb-vectors.sh: line 23: 644 Segmentation...

**Before**: Overflow check is scattered and duplicated in all places. **This PR:** - Single interface as CheckOverflow class, which abstract and uniform overflow check among ZeRO, ZeRO-Offload, Pipeline Parallelism, BF16_optimizer....

Hi I just took a quick look of fake tensor/module APIs. The defer initialization feature looks really cool to me. I am wondering, is there a way to de-materialize the...

enhancement