deepsparse
deepsparse copied to clipboard
Sparsity-aware deep learning inference runtime for CPUs
tested against: * CPU, GPU, FP32, FP16 * Zoo and local models * base and layer-dropped models
supports loading a recipe and model from sparsezoo, applying that recipe to the model, and then possibly converting it to a quantized torch model to run on CPU. `torch.quantization.convert` has...
Adding a Lambda deployment to the examples directory. This is very similar to the sagemaker deployment. The scope of this application encompasses the automation of the: 1. Construction of a...
Hello, I am keen to convert my quantized trained ONNX model into a blob file. OpenVino currently does not support this which is what I've been using so far. Is...
Note: Not integrated into server yet. Main hook is `start_file_watcher` for server to call into to kick off a watcher process. Everything else is just helpers for that. The file...
**Describe the bug** As in the title, setting `-ncores` with `-s async` uses more cores than set with `-ncores`. For example, with `deepsparse.benchmark oBERT-MobileBERT_14layer_50sparse_block4_qat.onnx -e onnxruntime -ncores 8 -s async`...
**Is your feature request related to a problem? Please describe.** Usage under windows 10. **Describe the solution you'd like** Support for Windows 10.
README for `deepsparse.license` tool proposed in #630. @jeanniefinks and Rob G to complete TODOs
This shows users how to use the `/deployment` directory of a model inside docker. Test plan: Run example docker build command from readme