dbogunowicz issues

Results 28 issues of


                                            dbogunowicz

[Commercial Server] Integrate PrometheusLogger with Grafana (WIP)

Pipeline support for loading from deployment directories with config.json

## PR description This PR has two main goals: 1. Make pipelines read `config.json` file from `deployment` directory 2. Make pipelines accept `deployment` directory as a `model_path` argument. Detailed changes:...

Yolact Blog Branch

This is a branch that: - holds the `blog.md` file with the contents of the blog - holds the `make_videos.sh` script to quickly create the required media. For benchmarking purposes,...

[Text Generation][V2] `LinearRouter` to accept SPLIT/JOIN

It seems that fundamentally at the `Pipeline` level, there is an assumption that `ops` is a list, not a dictionary. To reproduce: ```python from deepsparse.v2.text_generation import TextGenerationPipelineNoCache prompt = ["Some...

[Forkless SparseML Transformers] [Feature Branch] Setting Up The `modification` module

## Summary This pull request addresses the removal of the dependency on `nm-transformers` in favor of the original HF `transformers.` The primary motivation for this change is to simplify maintenance...

[Experimental][StarCode] KV Cache Injection

## Feature Description The results of my experimentation with the `tiny_starcoder` model. ## Findings: - the original KV cache is being added not as separate arrays: `past_key_values.{attn_block_id}.values` and `past_key_values.{attn_block_id}.keys`, but...

dbogunowicz

[Commercial Server] Integrate PrometheusLogger with Grafana (WIP)

Pipeline support for loading from deployment directories with config.json

Yolact Blog Branch

[Text Generation][V2] `LinearRouter` to accept SPLIT/JOIN

[Forkless SparseML Transformers] [Feature Branch] Setting Up The `modification` module

[Experimental][StarCode] KV Cache Injection

[KV Cache Injection][Doc] README.md for the KVCacheInjector

Refactor the quantization modification logic

SparseML dependency on `compressed-tensors`

[WiP] Fixing kv cache injection for LlaMa and Mistral