maxtext
maxtext copied to clipboard
Deepstack Qwen3 Omni
Description
Add deepstack support from this paper https://arxiv.org/abs/2406.04334 idea is to inject intermediate representations of the vision encoder into the intermediate layers of the llm.
Tests
I looked at the outputs at various steps and verified that the injection is happening correctly
Checklist
Before submitting this PR, please make sure (put X in square brackets):
- [X] I have performed a self-review of my code. For an optional AI review, add the
gemini-reviewlabel. - [X] I have necessary comments in my code, particularly in hard-to-understand areas.
- [X] I have run end-to-end tests tests and provided workload links above if applicable.
- [X] I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.