ComfyUI-WanVideoWrapper hi,bro ,new video repo

It can generate multi-shot videos with minute-level details and cinematic effects, while maintaining good consistency in characters and long-term memory. 🎬 https://holo-cine.github.io/ The code has just been open-sourced. If you are interested, please give it a star 🙏 https://github.com/yihao-meng/HoloCine

models:https://huggingface.co/hlwang06/HoloCine

Oct 24 '25 05:10 T8mars

I wonder if we can extract the full attention versions of the model as LoRAs.

They look simply being finetunes of Wan2.2

Oct 24 '25 06:10 kabachuha

I wonder if we can extract the full attention versions of the model as LoRAs.

They look simply being finetunes of Wan2.2

It also needs new attention and rope code, seems to split it to allow for the multiple shots.

Oct 24 '25 08:10 kijai

I remember split-cross attention (maybe not in their form precise) was available in this wrapper from virtually its start, so you quite outpaced their main concept :)

I saw the model already appeared on your HF, thank you work the work!

Oct 24 '25 08:10 kabachuha

oh wow, this looks interesting for sure. Its crazy how many Wan based models pops up lately, its like a pandora's box been opened Each making the gap between closed source and open source feel less and less. Multi-shot, multi-scene seemed to be a bit exclusive for Kling etc. Not anymore ;-)

I saw the model already appeared on your HF, thank you work the work!

And if it wasnt for Kijai it might as well been closed source, practically inaccessible to most of us ... Thanks a lot ;-)

In Planning: HoloCine-audio 😮

Oct 24 '25 08:10 RuneGjerde

Its crazy how many Wan based models pops up lately, its like a pandora's box been opened

Ironically, Wan2.5 going closed source gave stimulus for smaller projects, like Ovi, Stable Infinity, Mocha, and this one, as they got a chance to be in the central limelight

Elsewise, they would likely have stayed as private small prototypes, because all the opensource attention would be pulled by Wan2.5 without a trace

I'm 100% sure we wouldn't have Ovi if Wan2.5 was released

Oct 24 '25 09:10 kabachuha

Its crazy how many Wan based models pops up lately, its like a pandora's box been opened

Ironically, Wan2.5 going closed source gave stimulus for smaller projects, like Ovi, Stable Infinity, Mocha, and this one, as they got a chance to be in the central limelight

Elsewise, they would likely have stayed as private small prototypes, because all the opensource attention would be pulled by Wan2.5 without a trace

I'm 100% sure we wouldn't have Ovi if Wan2.5 was released

yeah, what do you think wan 2.5 can come soon? open source can make that model better

Oct 24 '25 12:10 Rudra-ai-coder

They now uploaded the "sparse" versions of the models to Huggingface and they are worth fp8 quantization as well

Oct 24 '25 14:10 kabachuha

yeah, what do you think wan 2.5 can come soon? open source can make that model better

I don't know about Wan 2.5, but LTX 2 will be open-sourced at the end of November. And this model really looks promising. Judging by my tests (you can test it for free on their website), it could bring a new evolutionary leap to open-source video generation models. Features:

1080p at 25-50 fps
Compatible with consumer graphics cards (whatever that means) 🤷‍♂️🙂
Video generation + audio, with speech in different languages (I tested both English and Russian, and it works good)
Duration: 6/8/10 seconds
Fast generation Plans include a model with 4K generation.

The only drawback for me is the fixed 16:9 aspect ratio, but they promise to add more different aspect ratios in the future.

Currently, they have two versions available for testing: Fast and Pro. Both can generate everything described above in 1080p resolution, and an Ultra version for 4K generation will be released in the future. Of course, the Pro version produces a higher-quality image, but I'm not sure whether this is due to the fact that the models are different or if the Pro version simply generates with more steps than the Fast version.

In any case, they promise detailed documentation, training code, etc. So, personally, I'm looking forward to this model.

Oct 25 '25 15:10 Kvento

@Kvento great!

We are mainly talking about HoloCine in this thread though :)

In difference with LTXV/Wan, it produces scenes cuts though a special block sparse attention mechanism. It's like Sora2, but with sound yet. See their examples https://holo-cine.github.io/

LTXV, I think, will have their own wrapper (they have already)

Oct 25 '25 15:10 kabachuha

In difference with LTXV/Wan, it produces scenes cuts though a special block sparse attention mechanism. It's like Sora2, but with sound yet. See their examples https://holo-cine.github.io/

Yeah, I've been keeping an eye on this project, and I think it's a really great one. While they don't have sound yet, it looks really impressive. I saw that the HoloCine model has appeared in the KJ repository, but as far as I understand, it's not implemented in the wrapper, right?

Oct 25 '25 16:10 Kvento

One can say, it's already implemented - partially. HoloCine consists of two parts: split cross-attention (text-video alignment) and sparse inter-clip attention. Split cross-attention (different text prompt for each subsection of a video, even without context options) has been implemented in the Wrapper since the spring, it's like an Easter egg, to activate it you need to put "|"s into the KJ text encoder between the desired section-wise prompts.

The remaining thing is sparse inter-shot self attention, which persistence of object and stitches them into a single narrative. And the control mechanism, to distribute the prompts across the sections, while keeping the global prompt.

Oct 25 '25 16:10 kabachuha

One can say, it's already implemented - partially. HoloCine consists of two parts: split cross-attention (text-video alignment) and sparse inter-clip attention. Split cross-attention (different text prompt for each subsection of a video, even without context options) has been implemented in the Wrapper since the spring, it's like an Easter egg, to activate it you need to put "|"s into the KJ text encoder between the desired section-wise prompts.

The remaining thing is sparse inter-shot self attention, which persistence of object and stitches them into a single narrative. And the control mechanism, to distribute the prompts across the sections, while keeping the global prompt.

The easter egg splits the prompt but the original code works with a global prompt and then split prompts. I guess, kijai didn‘t add a global tag? And holocine add shot-cut-frames as list of integer.

Oct 25 '25 21:10 railep

@railep It was in the spring, when HoloCine was not released yet. The new additions, such as the global tag, are needed to be integrated into the encoder, maybe with a new encoder node.

I guess, I sound weird in this thread, because I could help the integration myself and not just speak about it, but I don't know the way how to interact with KJ directly, so it won't cause development conflicts.

Oct 26 '25 07:10 kabachuha

https://github.com/kijai/ComfyUI-WanVideoWrapper/pull/1566 Tried a version of implementation!

Oct 26 '25 16:10 Dango233

@Dango233 Thank you!

My first result with your implementation:

https://github.com/user-attachments/assets/dd7891c8-e820-45c9-923d-f213d0a700b4

There are some artifacts, but it does the multishot job good enough.

Oct 26 '25 20:10 kabachuha

I got it working until 7 shots (343 frames), at which it breaks and gives grayish vague images.

What is your experience with the sequence limits?

Oct 27 '25 09:10 kabachuha

"Holocine for the poor" - tried the model just using the default WanWrapper nodes, being curious ... And it seems to do a decent job i guess. Not always respecting the scene cuts, but not sure if thats due to implementation or model .. or perhaps just from being a shorter run, and not enough frames to do more

But seems ok. I thought it wouldn't work and that it needed some sort of implementation, but maybe it does not ;-) At least the basic functionality seems decent

https://github.com/user-attachments/assets/c856f2cc-764f-49e1-9684-56ef42b5191d

(a short test with low steps and lightx lora)

Nov 06 '25 16:11 RuneGjerde

It even works with native nodes. It has somewhat consistency but 1-2 scene cuts at max. It is ok at least.

Nov 06 '25 16:11 railep

It even works with native nodes. It has somewhat consistency but 1-2 scene cuts at max. It is ok at least.

yes that might be. I didnt get all the cuts, so perhaps thats the limitation currently

Nov 06 '25 16:11 RuneGjerde

Saw a some hints to num frames and shot cut frames, so i thought maybe the model understands those as well. And using those tags it seems to perform better, at least from first few random tests.. But i could just be imagining things, or it was just a better seed ;-) i dont know for sure

(also did more steps than the usual low step count, so maybe it was that)

[global caption] blah blah blah.. etc.. the scene contains 4 shots .. [shot cut]... [shot cut frames] 37, 113, 205 [num frames] 241

https://github.com/user-attachments/assets/51cc9867-567e-4c05-af91-e3eb60ad9517

https://github.com/user-attachments/assets/330a9ddd-8983-4fe3-8cbf-4f154f4d9784

Nov 06 '25 21:11 RuneGjerde