dbogunowicz
dbogunowicz
## PR description This PR has two main goals: 1. Make pipelines read `config.json` file from `deployment` directory 2. Make pipelines accept `deployment` directory as a `model_path` argument. Detailed changes:...
This is a branch that: - holds the `blog.md` file with the contents of the blog - holds the `make_videos.sh` script to quickly create the required media. For benchmarking purposes,...
It seems that fundamentally at the `Pipeline` level, there is an assumption that `ops` is a list, not a dictionary. To reproduce: ```python from deepsparse.v2.text_generation import TextGenerationPipelineNoCache prompt = ["Some...
## Summary This pull request addresses the removal of the dependency on `nm-transformers` in favor of the original HF `transformers.` The primary motivation for this change is to simplify maintenance...
## Feature Description The results of my experimentation with the `tiny_starcoder` model. ## Findings: - the original KV cache is being added not as separate arrays: `past_key_values.{attn_block_id}.values` and `past_key_values.{attn_block_id}.keys`, but...
The document that should guide the user through the creation of a KV cache model, that can be later used in the pipeline. This PR goes "in tandem" with: https://github.com/neuralmagic/deepsparse/pull/1149
# Feature Description The `sparseml.transformers.sparsification.modification` package is a set of modifications that are applied to some of the transformer models, to make them compatible with our quantization flows. This PR...
GHA will be green after: https://github.com/neuralmagic/compressed-tensors/pulls lands