https://next-gpt.github.io/
An end-to-end MM-LLM that perceive input and generate output in arbitrary combinations (any-to-any) of text, image, video, and audio and beyond.
NExT-GPT