Prince Canuma
Prince Canuma
VisionZip A simple yet effective method that selects a set of informative tokens for input to the language model, reducing visual token redundancy and improving efficiency while maintaining model performance....
## Issue I keep getting `nan` loss when training Llama-3.2-vision I tried: - gradient clipping - lower learning rate - higher batch size, lora rank and alpha But with no...
This is just a starting point, @BenLumenDigital is taking care of it. ## Checklist - [ ] Tests added/updated - [ ] Documentation updated - [ ] Issue referenced (e.g.,...
Uses BigVGAN codec so all you need to add is the discrete diffusion, encoder and conditioning model https://huggingface.co/PlayHT/PlayDiffusion
Supported models: - Qwen3 VL + MoE - Idefics 2 & 3 Closes #40 #48
**Summary:** This PR removes the dependency on `torch`, `torchvision`, and `transformers` by porting the necessary processors directly into `mlx-vlm`. It also restructures `pyproject.toml` to support optional installations. **Changes:** * **Removed...
Closes #594
High-level idea would be to define message format at the model level as a property (i.e., `get_messages`) that exists for models that support video or raise issue for models that...
### Discussed in https://github.com/Blaizzy/mlx-vlm/discussions/476 Originally posted by **avishekjana** August 27, 2025 Hi, I’m trying to fine-tune LLaMA 3.2 Vision and Qwen 2 VL, but the main challenge I’m facing is...
Closes #270