Tian, Feng
Tian, Feng
## **Summary** This is a design discussion RFC for contributing some device-agnostic compression algorithms, like the post training quantization(QDQ quant format) and structural sparsity supported by [Intel(R) Neural Compressor](https://github.com/intel/neural-compressor) into...
**Describe the bug** when source build latest code on Ubuntu 18.04.3 LTS and run bert pruning sparse example in DeepSpeedExample, you will see crash. from the log, it's because the...
This PR is used to contribute `snip_momentum` pruning algorithm in [Intel Neural Compress](https://github.com/intel/neural-compressor) to DeepSpeed compression like we proposed in [RFC](https://github.com/microsoft/DeepSpeed/issues/2894). The snip_momentum algo implements the algorithm described in [here](https://github.com/intel/neural-compressor/blob/master/neural_compressor/compression/pruner/README.md)....
This PR is used to demonstrate the functionality of snip_momentum structured pruning algo implemented in [here](https://github.com/microsoft/DeepSpeed/pull/3300). User can reproduce below result by running `source ./bash_script/pruning_sparse_snip_momentum.sh` with the PR mentioned at...
This PR is used to make weight only quantization work with autoTP. The sample code is like below: ```python model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16).to(device) ds_model = deepspeed.init_inference(model, mp_size=world_size, dtype=torch.float16, replace_with_kernel_inject=False) model...
## Type of Change Documentation for RFC submission ## Description This is a proposed RFC for DeepSpeed/INC integration
This RFC is to propose a Hugging Face-compatible yet flexible Weight Only Quantization (WOQ) format in INC, and then the model quantized by INC can be loaded by IPEX for...
**Describe the bug** it will raise below error on NV A100 GPU. raft_cagra.graph_degree32.intermediate_graph_degree32.graph_build_algoNN_DESCENT/process_time/real_time ERROR OCCURRED: 'Failed to create an algo: std::bad_alloc: out_of_memory: RMM failure at:/sparse/miniconda3/envs/py310/include/rmm/mr/device/pool_memory_resource.hpp:313: Maximum pool size exceeded' **Steps/Code...
**Describe the bug** it will raise below error on Xeon CPU. Error occurred running benchmark: Command '['/home/ubuntu/wwq/miniconda3/envs/neuralchat_rag/bin/ann/FAISS_CPU_FLAT_ANN_BENCH', '--build', '--data_prefix=./', '--benchmark_out_format=json', '--benchmark_counters_tabular=true', '--benchmark_out=./wiki_all_88M/result/build/faiss_cpu_flat,base.json.lock', '--raft_log_level=3', 'wiki_all_88M_faiss_cpu_flat,base,k10,bs10000_afc3d9c8-d53d-11ee-af72-0a7d5625b4dd.json']' died with . **Steps/Code to reproduce...