Prince Canuma

Results 63 issues of Prince Canuma

## Context *Gives the reviewer some context about the work and why this change is being made. Focus on the WHY from a product perspective.* ## Description *Provide a detailed...

## Context *Gives the reviewer some context about the work and why this change is being made. Focus on the WHY from a product perspective.* ## Description *Provide a detailed...

## Summary - merge installation instructions into the docs homepage - drop the dedicated installation page - update navigation to remove the old link - expand the index with quick...

codex

enhancement

Investigate and implement Activation-aware Weight Quantization (AWQ) and Dynamic Weight Quantization (DWQ) techniques specifically for vision models. Motivation: Vision models often have larger parameter counts and compute requirements. Effective quantization...

enhancement

Implement quantization techniques for the key-value (KV) cache to reduce memory footprint and potentially improve inference speed. Motivation: KV cache can consume significant memory, especially for long contexts. Quantization would...

Implement a persistent prompt caching mechanism similar to the one used in mlx-lm (reference: https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/generate.py#L317-L318) to improve efficiency in chat applications. Motivation: As chat conversations grow longer, the time-to-first-token currently...

enhancement

reported by @bean980310 in #311

enhancement

Scipy was deprecated in https://github.com/Blaizzy/mlx-vlm/pull/301 And replaced with PIL and CV2. However, they are not numerically identical or within the threshold. The replacements work well and don't miss details, but...