ComfyUI-WanVideoWrapper UniLumos

It seems like its based on wan, and its seems like its "just" a lora, and by that maybe it works already ;-) But not sure. Seemingly its a T2V model, and it can still take input video. Maybe its of interest for the wrapper

The only thing is despite tons of Wan reference in the code, i dont see any download Wan as weights. So maybe its just a short few frames proof of concept kinda of thing. Not sure ;-)

It seems to have 4 modes.

Foreground & background video reference
Background video reference, text prompted foreground
Foreground video reference, text prompted background
text prompted foreground and text prompted background

And auto masks everything as far as I can tell But only took a short peak into the repro, i am not sure i understood it correctly though ;-)

https://github.com/alibaba-damo-academy/Lumos-Custom/

Nov 09 '25 17:11 RuneGjerde

Relight … don’t we already have a good relight lora? But the more the better. I think, we do not need specific code changes since it combines three videos to create a new one. We just need a good workflow for that. Question: Why do you think it creates masks automatically? The input for the inference files is a csv file with paths to video, background video, and mask video.

Nov 10 '25 14:11 railep

Yes i looked further, and in their example folder there clearly is a mask video. So i guess its manually created in the inference code.

I did try their model, but neither "Extra model" or "Lora" worked, but i might have done something wrong ;-)

If its a good model or not, or worth the trouble, i dont know. Some of the examples look good, other a bit more like typical mask out subject and "glue on top". But maybe it has a nice relighting feature to it

Nov 10 '25 16:11 RuneGjerde

Well it's pretty simple to make run, but I don't know if it's any good... the subject changes a lot even in their examples. I also can't find the code that creates the noisy mask, it seems to be different in each bg example video they have.

https://github.com/user-attachments/assets/ae958011-3f54-4681-aea9-3c14b212234a

Nov 10 '25 17:11 kijai

Yeah not sure if its doing something great, or if pretty much what a color match node and a mask could do same-ish Will give it a try, now that i see it works ;-)

Nov 10 '25 17:11 RuneGjerde

I guess, they're using their in-house-solution for the mask generation: https://huggingface.co/Alibaba-DAMO-Academy/RynnEC-7B And I do not believe that the model is worth it. At least for my use cases, it isn't.

Nov 10 '25 17:11 railep

Looked in the paper and the mask part was: Gaussian Inpainting fills the background using random noise sampled with the same mean and variance as the subject region

Which was simple enough to add as a node, so in the end this wasn't much work to support. Works with distill LoRA too, quality isn't amazing but lighting is decent:

https://github.com/user-attachments/assets/6415413e-1f65-4cdd-9275-c8ce91ba9724

Nov 10 '25 17:11 kijai

Not too bad, looks pretty good ;-)

I originally thought it was a lora/extra model thing that would patch with the full wan. But i see now its a small "standalone" model. So i guess quality thereafter. And maybe more a research / proof-of-concept from them.

But for just for quick fun swap background video it works ok, i guess ... ;-) And for those with low vram this one should run fast and well

I saw you added the gaussian node and a workflow. Gave it a test run

https://github.com/user-attachments/assets/bcfbeeca-4b58-4d14-bf97-2508f22b5557

https://github.com/user-attachments/assets/477f833e-69e3-45b7-81c3-a97d62510c4b

https://github.com/user-attachments/assets/191e7b8b-33ac-4f01-aa9d-06f79465adb7

Nov 10 '25 19:11 RuneGjerde

From the details in the UniLumos paper (for example, Fig. 6), it can be inferred that the caption’s background description refers to the background video provided, or to the intended generated background. However, in the current example workflow, the caption actually describes the background of the source video.

Maybe that’s what caused some of the differences in the results.

Nov 13 '25 01:11 suruoxi

Thanks for the feedback @suruoxi. Will try it out ;-) Already looks pretty good i think, and if that would improve it further, all good ;-)

Nov 13 '25 05:11 RuneGjerde

From the details in the UniLumos paper (for example, Fig. 6), it can be inferred that the caption’s background description refers to the background video provided, or to the intended generated background. However, in the current example workflow, the caption actually describes the background of the source video.

Maybe that’s what caused some of the differences in the results.

The prompt in the example is directly taken from the original code:

https://github.com/alibaba-damo-academy/Lumos-Custom/blob/main/UniLumos/UniLumos/examples/examples_refined.csv

Nov 13 '25 08:11 kijai

From the details in the UniLumos paper (for example, Fig. 6), it can be inferred that the caption’s background description refers to the background video provided, or to the intended generated background. However, in the current example workflow, the caption actually describes the background of the source video. Maybe that’s what caused some of the differences in the results.

The prompt in the example is directly taken from the original code:

https://github.com/alibaba-damo-academy/Lumos-Custom/blob/main/UniLumos/UniLumos/examples/examples_refined.csv

Result with original code

https://github.com/user-attachments/assets/a9cba40c-4bf6-4d33-a737-8397fd70ea84

Result with the caption describing the background.

https://github.com/user-attachments/assets/c42ef2e9-01cd-440d-88c7-08d70b77ebed

Hard to tell which is better. But maybe we should follow the original version.

Nov 13 '25 09:11 suruoxi

UniLumos - relight videos