Prince Canuma issues

Results 63 issues of


                                            Prince Canuma

Server v2

## Context *Gives the reviewer some context about the work and why this change is being made. Focus on the WHY from a product perspective.* ## Description *Provide a detailed...

Add UI v2

## Context *Gives the reviewer some context about the work and why this change is being made. Focus on the WHY from a product perspective.* ## Description *Provide a detailed...

[WIP] Add support for Phi4-multimodal

Update docs landing page

## Summary - merge installation instructions into the docs homepage - drop the dedicated installation page - update navigation to remove the old link - expand the index with quick...

codex

Investigate and implement Activation-aware Weight Quantization (AWQ) and Dynamic Weight Quantization (DWQ) techniques specifically for vision models. Motivation: Vision models often have larger parameter counts and compute requirements. Effective quantization...

enhancement

Add Support for KV Cache Quantization

Implement quantization techniques for the key-value (KV) cache to reduce memory footprint and potentially improve inference speed. Motivation: KV cache can consume significant memory, especially for long contexts. Quantization would...

Implement Persistent Prompt Cache to Reduce Time-to-First-Token in Chat Contexts

Implement a persistent prompt caching mechanism similar to the one used in mlx-lm (reference: https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/generate.py#L317-L318) to improve efficiency in chat applications. Motivation: As chat conversations grow longer, the time-to-first-token currently...

enhancement

Adjusting sample_utils.py for top_k and min_p Parameters

reported by @bean980310 in #311

enhancement

Match the output of scipy.ndimage.zoom on multimodality's vision module `resize_image()`

Scipy was deprecated in https://github.com/Blaizzy/mlx-vlm/pull/301 And replaced with PIL and CV2. However, they are not numerically identical or within the threshold. The replacements work well and don't miss details, but...

Prince Canuma

Server v2

Add UI v2

[WIP] Add support for Phi4-multimodal

Update docs landing page

Add eval benchmarks

Add AWQ/DWQ for Vision Models

Add Support for KV Cache Quantization

Implement Persistent Prompt Cache to Reduce Time-to-First-Token in Chat Contexts

Adjusting sample_utils.py for top_k and min_p Parameters

Match the output of scipy.ndimage.zoom on multimodality's vision module `resize_image()`