ComfyUI-WanVideoWrapper icon indicating copy to clipboard operation
ComfyUI-WanVideoWrapper copied to clipboard

And another new Subject2Video Wan 2.1 Model (ByteDance)

Open railep opened this issue 4 months ago • 27 comments

https://github.com/bytedance/BindWeave A new Phantom-like model with very similar benchmark results. Not sure if it is worth trying but wanted to share it if someone wants to try it. (Don‘t think that there are code modifications necessary to run it.)

railep avatar Nov 05 '25 13:11 railep

From the examples it looks very interesting. Not as "pasted on top" as Vace sometimes look like. More integrated and natural looking, and good face consistency. But of course might be cherry picked examples ;-) https://lzy-dot.github.io/BindWeave/

RuneGjerde avatar Nov 05 '25 14:11 RuneGjerde

It does need some new code because it uses Qwen 2.5 VL 7B as additional conditioner.

kijai avatar Nov 05 '25 15:11 kijai

We already have your prompt extender and native clip for qwen image. Isn‘t that enough for that?

railep avatar Nov 05 '25 15:11 railep

We already have your prompt extender and native clip for qwen image. Isn‘t that enough for that?

No, this uses the "raw" output (hidden_states) from Qwen VL 2.5 7B directly, there are 2 new layers in the model to project those to be added to text embedding. Also I never implemented the VL version, though there's implementation of that in core Comfy which could be enough for this.

kijai avatar Nov 05 '25 16:11 kijai

Yes, comfy has qwenvl. Maybe simply input its embeddings? (CLIP) On the other side, because of ComfyUI memory management, this model can stay in RAM /VRAM on high VRAM systems, so maybe a new node is needed after all. Like a ComfyUI's code, but with KJ offloading / disk caching

kabachuha avatar Nov 05 '25 16:11 kabachuha

the wizard is cooking ;-) i saw something brewing

RuneGjerde avatar Nov 06 '25 19:11 RuneGjerde

Still figuring the inputs out, not the easiest code to reverse engineer... even if not that much new code is needed, the inputs are quite unique, it's doing something already though:

https://github.com/user-attachments/assets/7e1fb6f6-aae4-4907-a180-30b89e4d5479

kijai avatar Nov 06 '25 23:11 kijai

that looks promising for sure ;-)

RuneGjerde avatar Nov 06 '25 23:11 RuneGjerde

Is it intentional that the model gets the images as overlays? Two subjects don’t work.

railep avatar Nov 08 '25 05:11 railep

Is it intentional that the model gets the images as overlays? Two subjects don’t work.

Lightx2v seems to mess it up some at least, more often it makes it obey the positioning of the references too much. If you manually place them, it does work:

https://github.com/user-attachments/assets/5e72f3ac-2bf2-411d-92b2-d18c1f68af86

https://github.com/user-attachments/assets/261e49fa-984f-4869-8d4f-1041049945a1

kijai avatar Nov 08 '25 14:11 kijai

Is it intentional that the model gets the images as overlays? Two subjects don’t work.

Lightx2v seems to mess it up some at least, more often it makes it obey the positioning of the references too much. If you manually place them, it does work:

WanVideoWrapper_I2V_00001.3.mp4

WanVideoWrapper_I2V_00002.7.mp4

I see. Thank you! Well… I guess, I have to code a custom node for that (unless I find one in kjnodes).

railep avatar Nov 08 '25 14:11 railep

Is it intentional that the model gets the images as overlays? Two subjects don’t work.

Lightx2v seems to mess it up some at least, more often it makes it obey the positioning of the references too much. If you manually place them, it does work: WanVideoWrapper_I2V_00001.3.mp4 WanVideoWrapper_I2V_00002.7.mp4

I see. Thank you! Well… I guess, I have to code a custom node for that (unless I find one in kjnodes).

The above was just by changing the crop_position in the resize node, the padding also obeys that.

kijai avatar Nov 08 '25 17:11 kijai

Yeah, I saw that and tried it too. For character tests, I take business photos of my wife and myself with similar resolutions, and a background in the same resolution. To put it left and right, I need different resolutions… it worked with 900x350 but the background was 2/3 filled. And got interesting results because I forgot to change the prompt :-D In short: I need a wf to cut the images according to the mask (to make it less wide) and the put it left or right. Should be possible with already existent nodes. Thanks again!

railep avatar Nov 08 '25 18:11 railep

Struggle to get it running without OOM, but if setting resolution really low I could get through it (but also the result suffer i bet) But had to give it a test ;-)

https://github.com/user-attachments/assets/454e8f06-e666-4c20-acc1-89a687cdcff7

Hollywood watch out ;-) Gonna make my own Snyder cut hehe

RuneGjerde avatar Nov 09 '25 18:11 RuneGjerde

I tried and tried but I am not satisfied. I have the feeling that it is basically merging pictures like a canvas editor and then making an i2v inference of the merged picture.

railep avatar Nov 10 '25 15:11 railep

@kijai Can you please provide a workflow that is available for bindweave branche for test?

peter4431 avatar Nov 13 '25 10:11 peter4431

@kijai Can you please provide a workflow that is available for bindweave branche for test?

Haven't finalized anything yet, but the videos I've posted here should include a workflow.

kijai avatar Nov 13 '25 12:11 kijai

Gave it another test run, even if its work in progress.. managed to get much higher resolutions now ;-) not sure if it was my pc, or some code optimization. But no OOM anymore

https://github.com/user-attachments/assets/196b3704-3552-4620-9480-17ec01ee833f

https://github.com/user-attachments/assets/d7b48186-78af-4f1b-a281-ac978c7a0a68

The Snider Cut - AI Edition ;-)

RuneGjerde avatar Nov 14 '25 00:11 RuneGjerde

While not the best animation, I was surprised the model was able to draw the character from behind from the start frame even, also this shows overlapping references can work:

https://github.com/user-attachments/assets/520fc89c-f29c-40b3-8f3e-d8236dd49f6d

kijai avatar Nov 14 '25 00:11 kijai

that looks pretty good. Not locked to left right then ;-) And in some ways look even better, since it casts shadow and all.. gives it a bit of perspective

RuneGjerde avatar Nov 14 '25 00:11 RuneGjerde

I think one key thing to do is make sure the clip vision and qwenvl embeds are cropped properly and include your subjects, since clip vision is locked to 224x224 resolution. I'm still not sure of the qwenVL resolution, but seems better to crop for it too.

kijai avatar Nov 14 '25 00:11 kijai

yeah that changes the output a lot. It looks far more realistic and has depth when the characters are not put left and right. Odd, what if you do want characters left and right ;) (or maybe a prompt would do that "woman to the left, man to the right")

just a completely random run, with lazy prompting, just "viking exploring new world". And not composed correctly i bet, was just a quick test run

https://github.com/user-attachments/assets/99c981c2-55dd-481c-9731-1e64577ef3f5


Odd, what if you do want characters left and right ;) (or maybe a prompt would do that "woman to the left, man to the right")

That works ;-)

https://github.com/user-attachments/assets/6913da6c-2ae9-4c16-a787-c559dbf98ab9

and with some additional prompting, the characters seems follow better (or i could be imagining that part). At least seemly a bit more realistic, than previous attempts when i had it left right

https://github.com/user-attachments/assets/ee4f2969-1a4e-4532-bbac-ff9d9c6390cf

Will play around with it a bit ;-)

RuneGjerde avatar Nov 14 '25 01:11 RuneGjerde

Wow, ok this is MUCH better. Now it really is useful without cropping the images to right and left. Thanks @kijai

railep avatar Nov 14 '25 14:11 railep

Yeah its growing on me for sure. Can quite easily use it to tell a little story with consistent characters, swap out the background for each scene etc.

RuneGjerde avatar Nov 14 '25 22:11 RuneGjerde

@kijai @RuneGjerde can this be used together with WanAnimate? WanAnimate frequently loses subject likeness

jnpatrick99 avatar Nov 20 '25 03:11 jnpatrick99

@jnpatrick99 Probably not Wan Animate.

But Lynx might be, but not sure. Its an "extra model" that works in WanVideo Wrapper. And its strength its keeping the face ID. For info https://byteaigc.github.io/Lynx/

https://github.com/kijai/ComfyUI-WanVideoWrapper/blob/main/example_workflows/wanvideo_T2V_14B_lynx_example_01.json But would need some creative node connections, probably just connecting "Add Lynx Embed" before connecting the main node to image_embed at the sampler.

(i'll try later, when i have a chance, unless some of the other experienced ones have something )

But Kijai will know better for sure, if Lynx could be used or not ;-)

RuneGjerde avatar Nov 20 '25 05:11 RuneGjerde

@RuneGjerde Thanks, but unfortunately couldn't make either of them to work. Lynx produces an error about IPAdapter incompatibility with current model, and bindweave an error about tensor dimensions :-(

jnpatrick99 avatar Nov 21 '25 02:11 jnpatrick99