neural-compressor icon indicating copy to clipboard operation
neural-compressor copied to clipboard

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Results 155 neural-compressor issues
Sort by recently updated
recently updated
newest added

## Type of Change feature API changed or not ## Description - [x] Support convert unquantized `linear` into `fp16` - [ ] Extend the fp16 ops list to align with...

INC3.X
PyTorch
PT2E

## Type of Change feature or bug fix or documentation or validation or others API changed or not ## Description detail description ## Expected Behavior & Potential Risk the expected...

won't merge

## Type of Change UT ## Description detail description ## Expected Behavior & Potential Risk the expected behavior that triggered by this PR ## How has this PR been tested?...

INC3.X

## Type of Change feature ## Description detail description ## Expected Behavior & Potential Risk the expected behavior that triggered by this PR ## How has this PR been tested?...

WIP

## Type of Change feature ## Description detail description ## Expected Behavior & Potential Risk the expected behavior that triggered by this PR ## How has this PR been tested?...

examples

## Type of Change Example

examples

## Type of Change feature ## Description - [x] update config params - [x] update `get_autoround_default_run_fn` - [x] update prepare/convert - [x] return paking model - [x] enhance ut -...

INC3.X
PyTorch

## Type of Change bug fix API changed or not: no ## Description Update lm-eval evaluate in ort llm example ## How has this PR been tested? extention test ##...

examples
ONNX Runtime

## Type of Change sq supports calib_func for auto-tune, no need for dataloader ## Description Layer-wise & block-wise enable Add ut check auto-tune Check llm examples ## Expected Behavior &...

## Type of Change 3.x example bug fix ## Description ## Expected Behavior & Potential Risk pass extension test ## How has this PR been tested? ## Dependency Change?