hi,bro ,new video repo
It can generate multi-shot videos with minute-level details and cinematic effects, while maintaining good consistency in characters and long-term memory. 🎬 https://holo-cine.github.io/ The code has just been open-sourced. If you are interested, please give it a star 🙏 https://github.com/yihao-meng/HoloCine
models:https://huggingface.co/hlwang06/HoloCine
I wonder if we can extract the full attention versions of the model as LoRAs.
They look simply being finetunes of Wan2.2
I wonder if we can extract the full attention versions of the model as LoRAs.
They look simply being finetunes of Wan2.2
It also needs new attention and rope code, seems to split it to allow for the multiple shots.
I remember split-cross attention (maybe not in their form precise) was available in this wrapper from virtually its start, so you quite outpaced their main concept :)
I saw the model already appeared on your HF, thank you work the work!
oh wow, this looks interesting for sure. Its crazy how many Wan based models pops up lately, its like a pandora's box been opened Each making the gap between closed source and open source feel less and less. Multi-shot, multi-scene seemed to be a bit exclusive for Kling etc. Not anymore ;-)
I saw the model already appeared on your HF, thank you work the work!
And if it wasnt for Kijai it might as well been closed source, practically inaccessible to most of us ... Thanks a lot ;-)
In Planning: HoloCine-audio 😮
Its crazy how many Wan based models pops up lately, its like a pandora's box been opened
Ironically, Wan2.5 going closed source gave stimulus for smaller projects, like Ovi, Stable Infinity, Mocha, and this one, as they got a chance to be in the central limelight
Elsewise, they would likely have stayed as private small prototypes, because all the opensource attention would be pulled by Wan2.5 without a trace
I'm 100% sure we wouldn't have Ovi if Wan2.5 was released
Its crazy how many Wan based models pops up lately, its like a pandora's box been opened
Ironically, Wan2.5 going closed source gave stimulus for smaller projects, like Ovi, Stable Infinity, Mocha, and this one, as they got a chance to be in the central limelight
Elsewise, they would likely have stayed as private small prototypes, because all the opensource attention would be pulled by Wan2.5 without a trace
I'm 100% sure we wouldn't have Ovi if Wan2.5 was released
yeah, what do you think wan 2.5 can come soon? open source can make that model better
They now uploaded the "sparse" versions of the models to Huggingface and they are worth fp8 quantization as well
yeah, what do you think wan 2.5 can come soon? open source can make that model better
I don't know about Wan 2.5, but LTX 2 will be open-sourced at the end of November. And this model really looks promising. Judging by my tests (you can test it for free on their website), it could bring a new evolutionary leap to open-source video generation models. Features:
- 1080p at 25-50 fps
- Compatible with consumer graphics cards (whatever that means) 🤷♂️🙂
- Video generation + audio, with speech in different languages (I tested both English and Russian, and it works good)
- Duration: 6/8/10 seconds
- Fast generation Plans include a model with 4K generation.
The only drawback for me is the fixed 16:9 aspect ratio, but they promise to add more different aspect ratios in the future.
Currently, they have two versions available for testing: Fast and Pro. Both can generate everything described above in 1080p resolution, and an Ultra version for 4K generation will be released in the future. Of course, the Pro version produces a higher-quality image, but I'm not sure whether this is due to the fact that the models are different or if the Pro version simply generates with more steps than the Fast version.
In any case, they promise detailed documentation, training code, etc. So, personally, I'm looking forward to this model.
@Kvento great!
We are mainly talking about HoloCine in this thread though :)
In difference with LTXV/Wan, it produces scenes cuts though a special block sparse attention mechanism. It's like Sora2, but with sound yet. See their examples https://holo-cine.github.io/
LTXV, I think, will have their own wrapper (they have already)
In difference with LTXV/Wan, it produces scenes cuts though a special block sparse attention mechanism. It's like Sora2, but with sound yet. See their examples https://holo-cine.github.io/
Yeah, I've been keeping an eye on this project, and I think it's a really great one. While they don't have sound yet, it looks really impressive. I saw that the HoloCine model has appeared in the KJ repository, but as far as I understand, it's not implemented in the wrapper, right?
One can say, it's already implemented - partially. HoloCine consists of two parts: split cross-attention (text-video alignment) and sparse inter-clip attention. Split cross-attention (different text prompt for each subsection of a video, even without context options) has been implemented in the Wrapper since the spring, it's like an Easter egg, to activate it you need to put "|"s into the KJ text encoder between the desired section-wise prompts.
The remaining thing is sparse inter-shot self attention, which persistence of object and stitches them into a single narrative. And the control mechanism, to distribute the prompts across the sections, while keeping the global prompt.
One can say, it's already implemented - partially. HoloCine consists of two parts: split cross-attention (text-video alignment) and sparse inter-clip attention. Split cross-attention (different text prompt for each subsection of a video, even without context options) has been implemented in the Wrapper since the spring, it's like an Easter egg, to activate it you need to put "|"s into the KJ text encoder between the desired section-wise prompts.
The remaining thing is sparse inter-shot self attention, which persistence of object and stitches them into a single narrative. And the control mechanism, to distribute the prompts across the sections, while keeping the global prompt.
The easter egg splits the prompt but the original code works with a global prompt and then split prompts. I guess, kijai didn‘t add a global tag? And holocine add shot-cut-frames as list of integer.
@railep It was in the spring, when HoloCine was not released yet. The new additions, such as the global tag, are needed to be integrated into the encoder, maybe with a new encoder node.
I guess, I sound weird in this thread, because I could help the integration myself and not just speak about it, but I don't know the way how to interact with KJ directly, so it won't cause development conflicts.
https://github.com/kijai/ComfyUI-WanVideoWrapper/pull/1566 Tried a version of implementation!
@Dango233 Thank you!
My first result with your implementation:
https://github.com/user-attachments/assets/dd7891c8-e820-45c9-923d-f213d0a700b4
There are some artifacts, but it does the multishot job good enough.
I got it working until 7 shots (343 frames), at which it breaks and gives grayish vague images.
What is your experience with the sequence limits?
"Holocine for the poor" - tried the model just using the default WanWrapper nodes, being curious ... And it seems to do a decent job i guess. Not always respecting the scene cuts, but not sure if thats due to implementation or model .. or perhaps just from being a shorter run, and not enough frames to do more
But seems ok. I thought it wouldn't work and that it needed some sort of implementation, but maybe it does not ;-) At least the basic functionality seems decent
https://github.com/user-attachments/assets/c856f2cc-764f-49e1-9684-56ef42b5191d
(a short test with low steps and lightx lora)
It even works with native nodes. It has somewhat consistency but 1-2 scene cuts at max. It is ok at least.
It even works with native nodes. It has somewhat consistency but 1-2 scene cuts at max. It is ok at least.
yes that might be. I didnt get all the cuts, so perhaps thats the limitation currently
Saw a some hints to num frames and shot cut frames, so i thought maybe the model understands those as well. And using those tags it seems to perform better, at least from first few random tests.. But i could just be imagining things, or it was just a better seed ;-) i dont know for sure
(also did more steps than the usual low step count, so maybe it was that)
[global caption] blah blah blah.. etc.. the scene contains 4 shots .. [shot cut]... [shot cut frames] 37, 113, 205 [num frames] 241
https://github.com/user-attachments/assets/51cc9867-567e-4c05-af91-e3eb60ad9517
https://github.com/user-attachments/assets/330a9ddd-8983-4fe3-8cbf-4f154f4d9784