kaizen
kaizen copied to clipboard
ENH: Add support for media
Enhance our PR description and issue label generators to support multimedia inputs (screenshots, videos, GIFs) in addition to text, for more comprehensive content analysis.
Are we using any vision models as of now?
I don't think so. Maybe we could use LLaVA?