Video as a prompt
video as context: ByteDance/Video-As-Prompt-Wan2.1-14B https://huggingface.co/ByteDance/Video-As-Prompt-Wan2.1-14B @kijai @kabachuha
49 frames meh :/
What about ability to stitch multiple videos together (using the 49 frames thing? as base) Idk
can be used to create gifs
A great LoRA substitute. Basically, IP-Adapter, but for VFX
can be used to create gifs
What needs to be done so it get working and implemented on the wrapper? (whats the roadmap of the work Kijai would usually do here?) I am curious
@BestofthebestinAI They add transformer blocks "parallel" to the main ones, resulting in "mixture of transformers" (like in their Bagel). In Wrapper, it may look like VACE or S2V (additional cross attention), but instead of sequential cross attention, an additional concatenated attention is placed

Lol, I tested it through diffusers and got a weird result. Maybe it's too low steps, of course. (20 vs 50)
https://github.com/user-attachments/assets/2277741b-37c1-488b-a25d-5810ade188e3
The reference is a dog from "deflate" lora.
I used a vibe-coded ComfyUI custom node for myself just now. https://gist.github.com/kabachuha/b6108cd9ac5e57641badaac45786fd01
It uses full CPU offload, so may be quite slow.
Reference:
https://github.com/user-attachments/assets/8de00d5c-5926-4cdd-8109-fec9ff79e900
looks like a lot of those viral clips with cut-as-a-cake, inflate, explode etc ;-) I guess thats the aim of the model as well.
Maybe someone clever can extract a lora from the model, and be used in regular workflows. But maybe its not as "easy/simple" as that
--
Googled "how to extract lora", and of course Kijai shows up ;-)
Seems like there is a node in KJ Nodes that might do that (but if its usable for this model i dont know hehe)
Yes, they are clearly inspired by those clips
Well, it's not just a LoRA. In fact, they add an entire parallel block and change the positional embeddings - RoPE :) This is slightly more complicated, but should be possible in the wrapper
wow this is ip adapter for videos . any news?
We are waiting for the mastermind to decide to make it work for us normal people^^, I although I wish I could do it myself, or learn to
Sneak peeking in branches it seems it's already cooking ;-)
i did try it, but had some OOM issues with it. Will try some more later ;-)
Thank you Rune You saw something in the dev branch?
@kijai please add vap gguf for use with wan 2.1 gguf
I tried the workflow from the vap repo but the videos are not really similar. It takes the reference image and tries to use the motions of the input video in a very rudimentary way. Do you have good examples how to use it correctly? Do I have to prompt very large prompts as in kijai's example?
Will give it a try, had some OOM issues with the workflow, will see if i can get through
I can't really get anything out of this, I can get matching results to their own code, but even that doesn't seem to work. Either it's extremely limited how it can be used and the prompts have to be perfect, or their code has some bug, or the Wan version just doesn't work.
wouldnt lose sleep over it, its a bit of a gimmicky feature i guess ;-)
Gave it a test, had to lower the resolution a bit to not OOM But seems to work ok for me, i guess , at least from a random couple of test runs
https://github.com/user-attachments/assets/7c7ceee8-ca0b-450a-b6f6-07145186ead3
https://github.com/user-attachments/assets/a66e1437-1dd3-45d9-ac12-fc208830e2bf
(i guess the "as a balloon made them fly away, so that might be promptin")
I tried some other "effects" too, but it seems tricky to get the description prompt correct ..
Probably should have added "by some large fingers" or something in the prompt to be an "exact copy", but the transformation seems to work
So as you said, my initial feeling is that it needs very accurate prompting. Think that might be the challenge
https://github.com/user-attachments/assets/48736da1-f21e-483d-9deb-75abcb580073
Anyways, was just a quick test, think the implementation is perhaps ok as is..
Probably should have added "by some large fingers" or something in the prompt to be an "exact copy", but the transformation seems to work
large fingers added to the prompt, and it seems ok
https://github.com/user-attachments/assets/503cde23-8525-4302-aa4e-e1068859c1f8
anyways, a bit of a fun gimmick i guess, only 49 frames, and you said you had some troubles with the code, so i wouldnt lose sleep over it ;-) works ok as is it seems
https://github.com/user-attachments/assets/32cd5977-72ea-4b88-9bed-b2052ec2f755
https://github.com/user-attachments/assets/b0ad9575-33e7-47ca-a488-5c2fe88c7615
Edit: trying some more, it might be that the crucial part is that the exact same transition words are duplicated in both the reference video and target video prompt. At least i could get away with far simpler prompts testing it a bit more. Just copy pasting a generic prompt about the transition to both, and only change subject description (and background if needed) seemed to work for me at least. But just a theory, only tried a few
But all in all it seems to work ok, didnt try all examples, but a few styles, a few camera movements, and a few funny ones like squish and inflate
Nice! How I can I try that? @RuneGjerde
Nice! How I can I try that? @RuneGjerde
Its currently in a separate branch, but maybe Kijai will merge to main soon, unless he intend to do some more work on it. You can try it if you switch branch, but for other things probably best to switch back to main after. Since its more up to date on other fixes that might been added (although its only 2 commits behind main, so probably no urgency to go back to main)
git switch vap
git switch main
(inside the WanVideoWrapper folder)
Alternatively just delete that folder, and write
git clone --single-branch --branch vap https://github.com/kijai/ComfyUI-WanVideoWrapper
And to switch back delete again and:
git clone https://github.com/kijai/ComfyUI-WanVideoWrapper