kabachuha

Results 40 comments of kabachuha

Using a ready made VAE from the diffusers library might be a better choice than reinventing the wheel (and compatible with a lot more VAEs) See example here https://github.com/Vchitect/Latte/blob/9ededbe590a5439b6e7013d00fbe30e6c9b674b8/sample/sample.py#L69

Additionally, here is my project for high-quality long-form (recursive) video captioning using VideoLLaVA https://github.com/kabachuha/video2scenario which may be useful for you

Well, we contribute to opensource too, from time to time (PixArt-alpha/delta and GradTTS family are from Noah's Ark too), especially when it's tangential to the main work and can't be...

Visual examples of masked (not only temporal, but spacial!) training/inference process of Meta's V-Jepa https://github.com/facebookresearch/jepa

Made a PR for alternative attentions in https://github.com/PKU-YuanGroup/Open-Sora-Plan/pull/63

Inpainting training code for SDXL https://github.com/huggingface/diffusers/pull/6592/files (should be extended to 2+1D) RectifiedFlow diffusers implementation: https://github.com/huggingface/diffusers/blob/534f5d54faf24a72a0e2bff2f6e6379ea519c4ed/examples/community/instaflow_one_step.py#L55

Dude you want to add entire LLaVA's codebase as the captioner? Why not use Huggingface hub or a ready made software?

XXL weights might be actually better, because as the research on story understanding shows, the larger the hidden dimension, the better it understands the plot and relationships between IRL objects...

Thanks! We kinda fixed it in Deforum with adding extra sterilization for ffmpeg input Although it can break stuff for other plugin/script makers, closing it

You don't see it as the author? IIRC people reopened the issues just fine ![image](https://user-images.githubusercontent.com/14872007/235364302-1b15e5ff-7a1b-4277-b9fa-38ed3a711e42.png)