deepsparse
deepsparse copied to clipboard
Sparsity-aware deep learning inference runtime for CPUs
In case someone fails to create the V2 pipeline, I think it's helpful to print the error.
# Evaluator Move This PR moves a few modules from `deepsparse.evaluation` to `sparsezoo.evaluation` ## Motivation and Context The moved modules provide a common interface for evaluating models. This interface can...
This is currently a hack but it would be great to get a version of this into production so that we can use debug_analysis on the pipeline and pass real...
**Describe the bug** I downloaded and tested the [yolov8-s-coco-pruned70_quantized](https://sparsezoo.neuralmagic.com/models/yolov8-s-coco-pruned70_quantized?hardware=deepsparse-c6i.12xlarge&comparison=yolov8-s-coco-base&tab=4) model from the sparseZoo. When I simply infere the onnx model with onnx-runtime, I get an average of 1,92 seconds (over...
Add ultrachat200k for perplexity eval
Hi. The paper describes 8-bit quantization combined with pruning, which is fantastic. My question: has any research been done for 4-bit quantization? Since GPU memory is notoriously expensive, 4-bit quantization...
**Describe the bug** When I try to run the [example ](https://github.com/neuralmagic/deepsparse/blob/main/docs/llms/text-generation-pipeline.md) LLM TextGeneration code I get an assertion error. (Sorry for any formatting errors, if you have tips to make...
## Description Adds tests for Pipeline.run_async() ## Problem Testing run_async() currently requires some hacking in tests/server. Isolate Pipeline func's test. ## Solution Simple pipeline running run_async ## Usage ```python3 inference_state...
Show warning when overriding batch_size 0 to 1 https://app.asana.com/0/1201735099598270/1206262288703592