UniLumos - relight videos
It seems like its based on wan, and its seems like its "just" a lora, and by that maybe it works already ;-) But not sure. Seemingly its a T2V model, and it can still take input video. Maybe its of interest for the wrapper
The only thing is despite tons of Wan reference in the code, i dont see any download Wan as weights. So maybe its just a short few frames proof of concept kinda of thing. Not sure ;-)
It seems to have 4 modes.
- Foreground & background video reference
- Background video reference, text prompted foreground
- Foreground video reference, text prompted background
- text prompted foreground and text prompted background
And auto masks everything as far as I can tell But only took a short peak into the repro, i am not sure i understood it correctly though ;-)
https://github.com/alibaba-damo-academy/Lumos-Custom/
Relight … don’t we already have a good relight lora? But the more the better. I think, we do not need specific code changes since it combines three videos to create a new one. We just need a good workflow for that. Question: Why do you think it creates masks automatically? The input for the inference files is a csv file with paths to video, background video, and mask video.
Yes i looked further, and in their example folder there clearly is a mask video. So i guess its manually created in the inference code.
I did try their model, but neither "Extra model" or "Lora" worked, but i might have done something wrong ;-)
If its a good model or not, or worth the trouble, i dont know. Some of the examples look good, other a bit more like typical mask out subject and "glue on top". But maybe it has a nice relighting feature to it
Well it's pretty simple to make run, but I don't know if it's any good... the subject changes a lot even in their examples. I also can't find the code that creates the noisy mask, it seems to be different in each bg example video they have.
https://github.com/user-attachments/assets/ae958011-3f54-4681-aea9-3c14b212234a
Yeah not sure if its doing something great, or if pretty much what a color match node and a mask could do same-ish Will give it a try, now that i see it works ;-)
I guess, they're using their in-house-solution for the mask generation: https://huggingface.co/Alibaba-DAMO-Academy/RynnEC-7B And I do not believe that the model is worth it. At least for my use cases, it isn't.
Looked in the paper and the mask part was: Gaussian Inpainting fills the background using random noise sampled with the same mean and variance as the subject region
Which was simple enough to add as a node, so in the end this wasn't much work to support. Works with distill LoRA too, quality isn't amazing but lighting is decent:
https://github.com/user-attachments/assets/6415413e-1f65-4cdd-9275-c8ce91ba9724
Not too bad, looks pretty good ;-)
I originally thought it was a lora/extra model thing that would patch with the full wan. But i see now its a small "standalone" model. So i guess quality thereafter. And maybe more a research / proof-of-concept from them.
But for just for quick fun swap background video it works ok, i guess ... ;-) And for those with low vram this one should run fast and well
I saw you added the gaussian node and a workflow. Gave it a test run
https://github.com/user-attachments/assets/bcfbeeca-4b58-4d14-bf97-2508f22b5557
https://github.com/user-attachments/assets/477f833e-69e3-45b7-81c3-a97d62510c4b
https://github.com/user-attachments/assets/191e7b8b-33ac-4f01-aa9d-06f79465adb7
From the details in the UniLumos paper (for example, Fig. 6), it can be inferred that the caption’s background description refers to the background video provided, or to the intended generated background. However, in the current example workflow, the caption actually describes the background of the source video.
Maybe that’s what caused some of the differences in the results.
Thanks for the feedback @suruoxi. Will try it out ;-) Already looks pretty good i think, and if that would improve it further, all good ;-)
From the details in the UniLumos paper (for example, Fig. 6), it can be inferred that the caption’s background description refers to the background video provided, or to the intended generated background. However, in the current example workflow, the caption actually describes the background of the source video.
Maybe that’s what caused some of the differences in the results.
The prompt in the example is directly taken from the original code:
https://github.com/alibaba-damo-academy/Lumos-Custom/blob/main/UniLumos/UniLumos/examples/examples_refined.csv
From the details in the UniLumos paper (for example, Fig. 6), it can be inferred that the caption’s background description refers to the background video provided, or to the intended generated background. However, in the current example workflow, the caption actually describes the background of the source video. Maybe that’s what caused some of the differences in the results.
The prompt in the example is directly taken from the original code:
https://github.com/alibaba-damo-academy/Lumos-Custom/blob/main/UniLumos/UniLumos/examples/examples_refined.csv
Result with original code
https://github.com/user-attachments/assets/a9cba40c-4bf6-4d33-a737-8397fd70ea84
Result with the caption describing the background.
https://github.com/user-attachments/assets/c42ef2e9-01cd-440d-88c7-08d70b77ebed
Hard to tell which is better. But maybe we should follow the original version.