Add qwen3 vl
Summary of Changes
Hello @SangChengC, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly expands the model capabilities by adding full support for the Qwen3-VL and Qwen3-VL MoE models. It involves a deep integration of multimodal features, including a new visual encoder, specialized tokenization, and the innovative 'deepstack' mechanism to pass rich visual context through multiple layers. Alongside these major additions, the PR refines existing image processing, enhances Rotary Positional Embeddings for complex scenarios, and optimizes MoE weight handling and Whisper's attention for better performance.
Highlights
- Qwen3-VL Model Integration: Introduced comprehensive support for the Qwen3-VL (Vision-Language) model, including its Mixture-of-Experts (MoE) variant, by adding new model definitions, inference logic, and weight handling.
- Deepstack Feature Support: Integrated 'deepstack' features for multimodal models, allowing intermediate visual features from different layers of the vision encoder to be passed to the language model, enhancing multimodal understanding.
- Advanced Image Processing: Enhanced image processing utilities with more flexible
smart_resizeandresize_imagefunctions, explicitly handlingfactor,min_pixels, andmax_pixelsfor better control over image resizing and token length calculation. - Rotary Positional Embedding (RoPE) Enhancements: Updated RoPE handling in the Llama model to support
mrope_sectionand a new 'default' scaling type, along with anapply_interleaved_mropefunction for 3D rotary embeddings in Qwen3-VL. - MoE Weight Fusion Improvements: Modified the
FusedMoeWeightTPclass to correctly handle fused gate-up weights (3D tensors) for MoE experts, including tensor parallelism splitting and transposing, while also adding a check for quantization compatibility. - Whisper Model Optimization: Optimized the Whisper model's attention mechanism by introducing
WhisperSdpaAttention, which leveragestorch.nn.functional.scaled_dot_product_attentionfor improved efficiency.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.
Summary of Changes
Hello @SangChengC, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request integrates the Qwen3 Vision-Language (VL) model into the LightLLM framework, enabling efficient processing of multimodal inputs. It involves adapting the model's architecture, including its Mixture-of-Experts (MoE) components and specialized image processing pipeline, to work seamlessly within the existing inference system. The changes enhance the framework's capability to handle complex visual and textual data, ensuring accurate and performant multimodal inference for Qwen3 VL.
Highlights
- Qwen3 VL Model Integration: Introduced comprehensive support for the Qwen3 Vision-Language (VL) model, including both standard and Mixture-of-Experts (MoE) variants, within the LightLLM framework.
- Multimodal Image Processing Enhancements: Updated image resizing logic and introduced new vision transformer components (e.g.,
Qwen3VLPatchEmbed,Qwen3VLVisionBlock) to handle Qwen3 VL's specific image processing requirements, including deepstack features. - Advanced Rotary Embeddings (MRoPE): Implemented interleaved MRoPE (Multi-dimensional Rotary Position Embeddings) for Qwen3 VL, allowing for more complex positional encoding in multimodal contexts.
- Inference State and Weight Management: Added dedicated inference state (
Qwen3VLMOEInferStateInfo) and weight loading classes for Qwen3 VL, optimizing weight fusion for MoE layers and ensuring proper tensor parallelism handling. - Shared Memory for Deepstack Features: Extended shared memory utilities to support the efficient transfer and storage of 'deepstack features' generated by the vision model, which are then integrated into the language model's embeddings.
- Whisper Attention Optimization: Migrated the Whisper encoder layer to use
WhisperSdpaAttention, leveragingtorch.nn.functional.scaled_dot_product_attentionfor potential performance improvements.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.